subject:"Re\: \[HACKERS\] pg_autovacuum next steps"

Re: [HACKERS] pg_autovacuum next steps

2004-05-17 Thread Matthew T. O'Connor

Tom Lane wrote:
Matthew T. O'Connor [EMAIL PROTECTED] writes:
 

I probably said that wrong, but how do backends get their stats data?  
   

They read it out of a flat file that the stats collector rewrites every
so often.
 

Ok so that would be easy to do (if we decide we want to)
Is that really worth the loss of independence?  I'm not sure one way or
the other myself.  I suppose the autovac daemon could still be run by
hand for testing purposes, but it would have to run as user postgres and
on the same machine as the postmaster.
 

I'm not sure, it would allow autovacuum to check the stats more 
regularly.  I suppose it would be possible for pg_autovacuum to 
recognize if it's been launched by the postmaster or not and look up 
stats as appropriate, but as you say, I'm not sure it's worth it, and my 
first cut will work exactly as the current pg_autovacuum does.

Also, you didn't mention if I will be able to use the backend logging 
functions, I would guess that I can, but I'm not totally sure.
   

Yes, if you are launched by the postmaster then you'd have access to its
stderr etc, so you could just log via elog/ereport.  Again though this
puts you in a situation where the daemon *must* be launched by the
postmaster or it won't work.  How much arm's-length separation are you
willing to give up?
Well I think I would be more willing to give up the separation for 
logging purposes.  I would think that an error message issued by a 
postmaster sub-process should wind up in the same location as an error 
issued by the postmaster proper.  Also, people have requested lots of 
logging options such as syslog etc and I think it would be nice to get 
all this for free.

I would think that pg_autovacuum could determine if it's a postmaster 
sub-process and log appropriately, does sound reasonable?



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Gavin Sherry

On Mon, 22 Mar 2004, Matthew T. O'Connor wrote:

 On Sun, 2004-03-21 at 23:00, Bruce Momjian wrote:
C) Most importantly, I'm not backend hacker.  If someone wants to do the
initial work of getting it running as a backend process, I can take it
from there.  A while ago, Bruce offered to help me with any backend
issues I might have, so perhaps with a little help I can take a run at
it.
  
   I'd be happy to help you out.
 
  Agreed.

 Ok, thanks for the offer to help, but I think I understated things above
 when I said I'll need a little help :-)


I haven't looked at the code but...

 I have a few big picture questions.  Once pg_autovacuum is launched as a
 postmaster sub-process, what changes?  That is, currently pg_autovacuum
 uses libpq to connect to a database and issue queries including a vacuum
 / analyze command when needed.  After becoming a subprocess will
 (should) it still use libpq to connect to the databases, I don't think

It could use libpq but most definately shouldn't.

 so, is it even possible to do that?  If not, how will it checkout the
 stats of all the different databases?  I guess it should fork() a new
 backend, connect to it somehow, and use it to query the database, but
 I'm really not sure how this works.

It can interact with the stats collector (seperate backend) in the same
way that existing backends interact: through a domain socket.

 I'm looking through the backend startup code to see how the stats
 collector and the bgwriter work since they are probably two semi-close
 examples of what I'll have to do.  I think checkpoints does something
 similar in that it issues a checkpoint command.

The vacuum backend will call vacuum() (or something very like it)
directly. I imagine that when it gets called and on which table will be
based upon the existing algorithm.

Thanks,

Gavin

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Gavin Sherry

  Ok, thanks for the offer to help, but I think I understated things above
  when I said I'll need a little help :-)
 

 I haven't looked at the code but...

  I have a few big picture questions.  Once pg_autovacuum is launched as a
  postmaster sub-process, what changes?  That is, currently pg_autovacuum
  uses libpq to connect to a database and issue queries including a vacuum
  / analyze command when needed.  After becoming a subprocess will
  (should) it still use libpq to connect to the databases, I don't think

 It could use libpq but most definately shouldn't.

  so, is it even possible to do that?  If not, how will it checkout the
  stats of all the different databases?  I guess it should fork() a new
  backend, connect to it somehow, and use it to query the database, but
  I'm really not sure how this works.

 It can interact with the stats collector (seperate backend) in the same
 way that existing backends interact: through a domain socket.

  I'm looking through the backend startup code to see how the stats
  collector and the bgwriter work since they are probably two semi-close
  examples of what I'll have to do.  I think checkpoints does something
  similar in that it issues a checkpoint command.

 The vacuum backend will call vacuum() (or something very like it)
 directly. I imagine that when it gets called and on which table will be
 based upon the existing algorithm.

One point is this: vacuum() assumes that you are running in a fully
fledged backend. There'd be a fair bit of work involved in allowing a
single process to call vacuum() against multiple databases. As such, I
think that a vacuum backend for a specific database should be forked upon
the first connect. Also, the backend might like to try and workout if
there are any active backends for its database every so often and if not,
perform a final vacuum (if necessary) and exit, so that we don't have lots
of idle processes sitting around.

Is there a better approach than this?

Gavin

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Karel Zak

On Mon, Mar 22, 2004 at 02:35:37AM -0500, Matthew T. O'Connor wrote:
 On Sun, 2004-03-21 at 23:00, Bruce Momjian wrote:
C) Most importantly, I'm not backend hacker.  If someone wants to do the
initial work of getting it running as a backend process, I can take it
from there.  A while ago, Bruce offered to help me with any backend
issues I might have, so perhaps with a little help I can take a run at
it.
   
   I'd be happy to help you out.
  
  Agreed.
 
 Ok, thanks for the offer to help, but I think I understated things above
 when I said I'll need a little help :-)
 
 I have a few big picture questions.  Once pg_autovacuum is launched as a
 postmaster sub-process, what changes? 

 All. It's important do  it as backend process. Because  libpq has very,
 very limited and slow resources for work with backend stuff.
 
 The base should be the standard backend with different main loop that
 will instead  socket checks  some shared  information about  tables and
 calls  directly vacuum  stuff. In  this  case you  can  omit work  with
 connections, parser etc.

 I thought about  it in last days  and I found perfect  Tom's idea about
 FSM tables usage:

What  I had  in the  back of  my mind  was: each  backend counts
attempted insertions  and deletions in its  relcache entries (an
update adds to both counts).  At transaction commit or abort, we
know which  of these  two counts represents  the number  of dead
tuples added to each relation, so while we scan the relcache for
post-xact  cleanup  (which  we  will be  doing  anyway)  we  can
transfer the  correct count  into the shared  FSM entry  for the
relation.  This gives  us a reasonably accurate  count in shared
memory of all  the tuple obsoletions since bootup,  at least for
heavily-used  tables.  (The  FSM  might choose  to forget  about
lightly-used tables.)  The  auto vacuumer could look  at the FSM
numbers to decide which tables are highest priority to vacuum.
(2002-09-03 08:10:32)

 I looked at the code and I  think extend FSM tables will pretty simple,
 but I unsure how relcache counters Tom thought. Tom?

Karel

-- 
 Karel Zak  [EMAIL PROTECTED]
 http://home.zf.jcu.cz/~zakkr/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Richard Huxton

On Monday 22 March 2004 03:36, Matthew T. O'Connor wrote:
 1. Per Database defaults and Per table Thresholds:

 There are differing opinions as to the best way to providing these this
 feature.  The primary debate is where to save the configuration data.  I
 see three options:

  1.Store config data inside a special pg_autovacuum table inside
 existing databases that wants custom settings.

  2.Use a config file.  This would require some additional coding to add
 the required parsing, but is possible.

  3.Create a pg_autovacuum database inside any cluster that wants to
 customize their settings.

 Since many people do not like tools that clutter their databases by
 adding tables, I think option 1 (adding a pg_autovacuum table to
 existing databases) is right out.  Using a config file would be Ok, but
 would require additional parsing code.  My preference is option 3.

I've nothing against #3 as a default, but can I put in a suggestion for 1  3, 
or rather some setting definable at runtime/build-time that lets you select 
database + schema for autovacuum to find its config data. 

I might be wrong, but it strikes me as the sort of thing people running shared 
environments will want to choose for themselves.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Jan Wieck

Gavin Sherry wrote:
On Sun, 21 Mar 2004, Matthew T. O'Connor wrote:

On Sun, 2004-03-21 at 20:31, Christopher Kings-Lynne wrote:
  I think these configuration issues will become a lot easier if you make
  the autovacuum daemon a subprocess of the postmaster (like, say, the
  checkpoint process).  Then you have access to a host of methods for
  storing state, handling configuration, etc.

 Yeah - why delay making it a backend process? :)
Ok, well this was part of the reason to have this conversation.

My reasons:
A) I wasn't sure if people really thought this was ready to be
integrated.  Tom had said a while ago, that it was a good to keep it as
a contrib module while it's still actively being developed.
I was talking to Jan about some other work on VACUUM related to more
intelligent vacuuming. Namely, maintaining a map (outside of shared
memory) of blocks which have been pushed out of the free space map for
VACUUM to visit (which requires a backend process) and being aware of load
restrictions (ie, allowing user to only vacuum when the load average is
less than X, for example) and some other leveling stuff to ensure that
availability is consistent. Whilst this doesn't related to pg_autovacuum
specifically, it'd be great if they could be released at the same time, I
think.
I don't recall the outside of shared memory part. Anyhow, the whole 
story goes like this:

Maintain 2 bits per block that tell if the block has been vaccumed of 
all dead tuples since the last time it was dirtied, and if all its 
tuples are completely frozen. If those two conditions are true, there is 
no need to vacuum that block at all (Red Flag!!! On further thinking I 
realized that this assumes that the FSM is loss free).

With a default 8K blocksize, this means 32K per 1GB segment, making 4 
additional blocks. I actually think that these extra blocks should be 
somehow part of the heap files, so that they are subject to the regular 
buffer management.

To keep the lock contention on them low, vacuum and backends will 
set/clear new flags in the bufhdr flags member. That way, the bgwriter 
and checkpointer will be the usual suspects to set/clear these flags in 
the shared bitmap array stored in the extra blocks.

As to where to store these blocks, some block number arithmetic magic 
comes to mind. That way a blocks relnode and blockno automatically lead 
to the bits, even in the case of blind writes.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Matthew T. O'Connor

On Mon, 2004-03-22 at 03:36, Gavin Sherry wrote:
 One point is this: vacuum() assumes that you are running in a fully
 fledged backend. There'd be a fair bit of work involved in allowing a
 single process to call vacuum() against multiple databases. 

I can't imagine we want to do that.

 As such, I
 think that a vacuum backend for a specific database should be forked upon
 the first connect. Also, the backend might like to try and workout if
 there are any active backends for its database every so often and if not,
 perform a final vacuum (if necessary) and exit, so that we don't have lots
 of idle processes sitting around.

I don't understand, are you saying that upon postmaster startup a vacuum
backend should be forked for every database in the cluster?

 Is there a better approach than this?

One thought I had is that there wouldn't be a persistent daemon type
process, rather the auto_vacuum process, like the check point process,
would be kicked off periodically, it would initially be attached to
template1, then when it's done there it will figure out the next
database to take a look at and fork, the existing template1 process will
exit, and the new process will be connected to the next database, just
repeat this forking loop until done.  

Thoughts?




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Tom Lane

Matthew T. O'Connor [EMAIL PROTECTED] writes:
 There are differing opinions as to the best way to providing these this
 feature.  The primary debate is where to save the configuration data.  I
 see three options:

  1.Store config data inside a special pg_autovacuum table inside
 existing databases that wants custom settings.   

  2.Use a config file.  This would require some additional coding to add
 the required parsing, but is possible. 

  3.Create a pg_autovacuum database inside any cluster that wants to
 customize their settings.  

 Since many people do not like tools that clutter their databases by
 adding tables, I think option 1 (adding a pg_autovacuum table to
 existing databases) is right out.  Using a config file would be Ok, but
 would require additional parsing code.  My preference is option 3. 

Personally I like #2.  The claim that this requires extra code seems
bogus to me --- when you are working at the C code level, reading a
table takes code too.  I think both 1 and 3 represent clutter.  Also,
they will introduce entirely new sets of problems that we already know
how to deal with in the case of config files, but have no experience
with for config tables.  Some examples: how do you get the daemon to
notice that you changed the config (SIGHUP for config files, ??
otherwise)?  Can you easily get at the config data from a daemon process
regardless of which database it's attached to (if any)?  If the config
data is in a table, what sorts of interesting problems will arise from
trying to vacuum that table itself?

Basically I see config tables as cutting against the grain of everything
we've done to date in Postgres DBA-level configuration, and I do not see
a reason to justify that sort of reinvention.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Tom Lane

Gavin Sherry [EMAIL PROTECTED] writes:
 One point is this: vacuum() assumes that you are running in a fully
 fledged backend. There'd be a fair bit of work involved in allowing a
 single process to call vacuum() against multiple databases.

Make that it isn't going to happen.

 As such, I
 think that a vacuum backend for a specific database should be forked upon
 the first connect. Also, the backend might like to try and workout if
 there are any active backends for its database every so often and if not,
 perform a final vacuum (if necessary) and exit, so that we don't have lots
 of idle processes sitting around.

Lots of idle processes sitting around is right out, too.  Remember that
each one would eat a backend connection slot.  I think we are going to
have to limit this to *one* process at a time.  What that probably means
is that we successively launch an autovacuum process against each
database, it does whatever seems appropriate in that database and then
quits.  We could manage this just like checkpoints are presently managed
--- the only thing the postmaster has to know is the desired idle period
between end of one autovacuum and start of the next.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Matthew T. O'Connor

On Mon, 2004-03-22 at 07:25, Richard Huxton wrote:
 On Monday 22 March 2004 03:36, Matthew T. O'Connor wrote:
   1.Store config data inside a special pg_autovacuum table inside
  existing databases that wants custom settings.
 
   2.Use a config file.  This would require some additional coding to add
  the required parsing, but is possible.
 
   3.Create a pg_autovacuum database inside any cluster that wants to
  customize their settings.parsing code.  My preference is option 3.
 
 I've nothing against #3 as a default, but can I put in a suggestion for 1  3, 
 or rather some setting definable at runtime/build-time that lets you select 
 database + schema for autovacuum to find its config data. 
 
 I might be wrong, but it strikes me as the sort of thing people running shared 
 environments will want to choose for themselves.

If pg_autovacuum was being designed to live forever as a client app,
then I agree admins having a choice would be good.  But as we are going
to eventually move any auto_vacuum data and settings into the system
tables (when autovacuum is part of the system), I don't see the need to
expend the extra cycles, especially since people seem to be pushing hard
for autovacuum to be a backend function sooner rather than later.  



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Matthew T. O'Connor

On Mon, 2004-03-22 at 04:23, Karel Zak wrote:
  All. It's important do  it as backend process. Because  libpq has very,
  very limited and slow resources for work with backend stuff.

Agreed.
 
  The base should be the standard backend with different main loop that
  will instead  socket checks  some shared  information about  tables and
  calls  directly vacuum  stuff. In  this  case you  can  omit work  with
  connections, parser etc.

So am I to understand I can start up a postmaster subprocess and then be
able to monitor the activity of all the databases?  I guess that makes
sense since I would be talking to the stats collector directly via a
socket.  But that doesn't solve the problem of issuing the vacuum to
different databases, I would still create a new backend for every
database that needs a vacuum or analyze issues.

Also, we don't want to launch multiple simultaneous vacuums so we want
the commands to be serialized (I know some people want to be able to do
this if databases are located on different disks, but for now i'm
keeping things simple).  I have an idea for this that I just mentioned
in another message to the list.

  I thought about  it in last days  and I found perfect  Tom's idea about
  FSM tables usage:

There has been lots of discusion of incorporating FSM data into the
auto_vacuum decision process.  I am interested in exploring this, but
since I'm already biting off more than I can easily chew, I am going to
try and leave the decision making process the same for now.  

BTW I think we need to use both tools (stats and FSM) since not all
tables will be in the FSM, an insert only table still needs to be
analyzed periodically and a lightly updated table will eventually need
to be vacuumed.



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Matthew T. O'Connor

On Mon, 2004-03-22 at 10:51, Tom Lane wrote:
   2.Use a config file.  This would require some additional coding to add
  the required parsing, but is possible. 
 
 Personally I like #2.  The claim that this requires extra code seems
 bogus to me --- when you are working at the C code level, reading a
 table takes code too.  I think both 1 and 3 represent clutter.  Also,
 they will introduce entirely new sets of problems that we already know
 how to deal with in the case of config files, but have no experience
 with for config tables.  Some examples: how do you get the daemon to
 notice that you changed the config (SIGHUP for config files, ??
 otherwise)?  Can you easily get at the config data from a daemon process
 regardless of which database it's attached to (if any)?  If the config
 data is in a table, what sorts of interesting problems will arise from
 trying to vacuum that table itself?

I guess what I mean by extra code is extra work since I am comfortable
storing data in tables, and would need to do some work to put it in a
config file.

Some of this conversations should change if we really are going to
integrate autovacuum into the backend.  My suggestions for storing
config data was really assuming that we would continue to be a client
app for at least one more release cycle, but I think I'm hearing that
people want it in the backend now.

Assuming that we are being integrated into the backend, and assuming
that a autovacuum process is fired off periodically by the postmaster, I
think some config items should become GUC options and others will become
part of the system tables.  For example: enable_autovacuum should be a
GUC option, but the vacuum threshold for a given table should probably
just be a new column in the pg_class table.  Any data stored in the
system tables will be check every time the autovacuum process is run so
there won't be any problems with noticing that the admin change the
threshold for a given table. 

 Basically I see config tables as cutting against the grain of everything
 we've done to date in Postgres DBA-level configuration, and I do not see
 a reason to justify that sort of reinvention.

I agree if we are talking about a client app, but I think the scheme
laid out above is consistent for something that is part of the backend,
do you disagree?

Thanks,

Matthew O'Connor



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Alex J. Avriette

On Sun, Mar 21, 2004 at 05:32:59PM -0500, Matthew T. O'Connor wrote:
 Lately I have been thinking about the next steps for the pg_autovacuum
 daemon.  I have written up a document that describes what I'm planning
 to do next.   Please read the attached and response as I would really
 like some feedback.

Hi, Matthew. For our uses, we found that pg_autovacuum did not behave
as expected with vacuum_threshold set to 0.  For our purposes, we have
a very good idea of how many tuples need to be slurped up over a given
interval, and would like the autovacuum daemon to simply go through and
vacuum when it hits a, er, threshold.

Obligatory explanation: We were doing this with cron, but when
something gets really ugly, cron is not as intuitive as autovacuum.

Having this in the postmaster would be fine by me (and in fact, I would
even suggest that it belongs there), but I really need to have more
flexibility or we are going to (read: I will be tasked with) be writing
our own.

Sorry for the terseness -- and I haven't had a chance to go over the
whole thread; we're moving from 7.3.2 to 7.4.2 on the big database 
this week.

Alex

--
[EMAIL PROTECTED]
Alex J. Avriette, Systems Programmer
You have a spell checker. It is called a brain. Please use it. - Chris Lyth

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Gavin Sherry

  As such, I
  think that a vacuum backend for a specific database should be forked upon
  the first connect. Also, the backend might like to try and workout if
  there are any active backends for its database every so often and if not,
  perform a final vacuum (if necessary) and exit, so that we don't have lots
  of idle processes sitting around.

 Lots of idle processes sitting around is right out, too.  Remember that
 each one would eat a backend connection slot.  I think we are going to
 have to limit this to *one* process at a time.  What that probably means
 is that we successively launch an autovacuum process against each
 database, it does whatever seems appropriate in that database and then
 quits.  We could manage this just like checkpoints are presently managed
 --- the only thing the postmaster has to know is the desired idle period
 between end of one autovacuum and start of the next.

*slaps hand on forehead*

Yes, this is the best approach.

So, do we want a static time, a GUC controlled time or some time which is
modified by pg_autovacuum's/stat's collector's knowledge of the amount of
work which goes on in any given database?

Gavin

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Matthew T. O'Connor

Tom Lane wrote:

From the point of view of the postmaster a GUC-controlled delay would
seem like the best thing.  We could discuss having the autovacuum code
try to feed back adjustments in the delay, but remember that one of the
golden virtues for the postmaster proper is simplicity; that translates
directly to robustness.  We don't want the postmaster engaging in
anything complicated that could potentially lock it up or crash it due
to a bug.
 

Agreed.

[snip...]

This nice-sounding arrangement is probably not directly workable because
of the fact that the postmaster has no good way to know about or control
backends if they aren't its direct children.  Perhaps the autovac daemon
*should* use libpq, that is, not fork backends but connect via the
postmaster each time it wants to run a backend.  Then the backends are
ordinary children of the postmaster and everything acts normally.
(This could amount to using the existing autovac code, and simply adding
a frammish to the postmaster to autospawn the autovac daemon as a
non-backend child process.)
Well this certainly sounds like something that would be easy to do, 
which appeals to me at least as a first cut.  Question: Does this mean 
that I lose many of the advantages of being in the backend?  That is, 
would pg_autovacuum still be able to use facilities such as GUC, error 
reporting, access to the FSM, direct pipe connection to the stats 
collector etc?



---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Matthew T. O'Connor

Alex J. Avriette wrote:

Hi, Matthew. For our uses, we found that pg_autovacuum did not behave
as expected with vacuum_threshold set to 0.  For our purposes, we have
a very good idea of how many tuples need to be slurped up over a given
interval, and would like the autovacuum daemon to simply go through and
vacuum when it hits a, er, threshold.
 

Could you please explain this better, I don't really understand what the 
problem is.  If you want pg_autovacuum to perform a vacuum on a table 
that has had exactly X updates no matter what, you can just run it with 
-V0 -vX (where X is the vacuum threshold)  same thing can be done for 
analyze.  Are you saying that you did this and it did not work as expected?

Having this in the postmaster would be fine by me (and in fact, I would
even suggest that it belongs there), but I really need to have more
flexibility or we are going to (read: I will be tasked with) be writing
our own.
 

Everyone agrees it should be in the backend eventually the questions 
were just if it was ready, and how or if it should change when it goes in.

Sorry for the terseness -- and I haven't had a chance to go over the
whole thread; we're moving from 7.3.2 to 7.4.2 on the big database 
this week.

No problem, but I need more information to diagnose your problem.

Matthew

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Tom Lane

Matthew T. O'Connor [EMAIL PROTECTED] writes:
 Well this certainly sounds like something that would be easy to do, 
 which appeals to me at least as a first cut.  Question: Does this mean 
 that I lose many of the advantages of being in the backend?  That is, 
 would pg_autovacuum still be able to use facilities such as GUC, error 
 reporting, access to the FSM, direct pipe connection to the stats 
 collector etc?

If you aren't a backend then you couldn't safely access shared memory,
including FSM in particular.  I see no reason you couldn't use GUC
though.  There is no direct pipe connection to the stats collector,
except in the output direction which is not what you want, so I'm not
seeing your point there.

I am not sure that lack of FSM access is a showstopper, though.  We
could easily imagine inventing backend commands to read out whatever
info you want from FSM, so you could request the info from your
connected backend.

The more I think about this the more I like it --- it keeps the autovac
control code still at arms length from the backend which will surely
ease development and experimentation.  I suppose there is some overhead
in pushing data back and forth over the FE/BE protocol, but surely that
would be negligible next to the real work of vacuuming.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Matthew T. O'Connor

Tom Lane wrote:

If you aren't a backend then you couldn't safely access shared memory,
including FSM in particular.  I see no reason you couldn't use GUC
though.  There is no direct pipe connection to the stats collector,
except in the output direction which is not what you want, so I'm not
seeing your point there.
 

I probably said that wrong, but how do backends get their stats data?  
Meaning, when I do a select * from pg_stat_all_tables how is the 
backend getting that data.  The reason I'm interested is that if I don't 
have to fire up a backend just to check the stats that would reduce the 
load associated with the autovacuum daemon.  Someone earlier in the 
thread seemed to imply there was a way to do this.

I am not sure that lack of FSM access is a showstopper, though.  We
could easily imagine inventing backend commands to read out whatever
info you want from FSM, so you could request the info from your
connected backend.
 

Yeah I agree, and for phase 1 we can just continue working only on stats 
data.

The more I think about this the more I like it --- it keeps the autovac
control code still at arms length from the backend which will surely
ease development and experimentation.  I suppose there is some overhead
in pushing data back and forth over the FE/BE protocol, but surely that
would be negligible next to the real work of vacuuming.
Right, I think the overhead would be negligible.  Since you seem to 
think this is (or at least might be) a good idea, I will go ahead and 
try to get the postmaster to fire-up the autovacuum daemon.  So that the 
1st cut, will basically be pg_autovacuum exactly as it stands now,  just 
launched by the postmaster.

Also, you didn't mention if I will be able to use the backend logging 
functions, I would guess that I can, but I'm not totally sure.

Thanks again,

Matthew

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Tom Lane

Matthew T. O'Connor [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 There is no direct pipe connection to the stats collector,

 I probably said that wrong, but how do backends get their stats data?  

They read it out of a flat file that the stats collector rewrites every
so often.

 Meaning, when I do a select * from pg_stat_all_tables how is the 
 backend getting that data.  The reason I'm interested is that if I don't 
 have to fire up a backend just to check the stats that would reduce the 
 load associated with the autovacuum daemon.

Is that really worth the loss of independence?  I'm not sure one way or
the other myself.  I suppose the autovac daemon could still be run by
hand for testing purposes, but it would have to run as user postgres and
on the same machine as the postmaster.

 Right, I think the overhead would be negligible.  Since you seem to 
 think this is (or at least might be) a good idea, I will go ahead and 
 try to get the postmaster to fire-up the autovacuum daemon.  So that the 
 1st cut, will basically be pg_autovacuum exactly as it stands now,  just 
 launched by the postmaster.

 Also, you didn't mention if I will be able to use the backend logging 
 functions, I would guess that I can, but I'm not totally sure.

Yes, if you are launched by the postmaster then you'd have access to its
stderr etc, so you could just log via elog/ereport.  Again though this
puts you in a situation where the daemon *must* be launched by the
postmaster or it won't work.  How much arm's-length separation are you
willing to give up?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Gavin Sherry

On Mon, 22 Mar 2004, Tom Lane wrote:

 Matthew T. O'Connor [EMAIL PROTECTED] writes:
  Well this certainly sounds like something that would be easy to do,
  which appeals to me at least as a first cut.  Question: Does this mean
  that I lose many of the advantages of being in the backend?  That is,
  would pg_autovacuum still be able to use facilities such as GUC, error
  reporting, access to the FSM, direct pipe connection to the stats
  collector etc?


[snip]

 The more I think about this the more I like it --- it keeps the autovac
 control code still at arms length from the backend which will surely
 ease development and experimentation.  I suppose there is some overhead
 in pushing data back and forth over the FE/BE protocol, but surely that
 would be negligible next to the real work of vacuuming.

I was initially against the idea of using libpq but its growing on me too.

I think it would be good if the core functions of pg_autovacuum: threshold
algorithms, connection, issuing commands can be (re?)designed such that
not only the backend can link against it but also a stripped down
pg_autovacuum binary which can be used manually. That way users can have a
choice and a workaround if there are any issues with the backend model.
Also useful for debugging.

Thanks,

Gavin

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Joe Conway

Matthew T. O'Connor wrote:
* Inability to customize thresholds on a per table basis
I ran headlong into this one. IMHO fixing this is critical.

* Inability to set default thresholds on a per database basis
* Inability to exclude specific databases / tables from pg_autovacuum
monitoring
These would be nice to have, but less critical than #1 I think.

* Inability to schedule vacuums during off-peak times
This would be *really* nice to have. In my recent case, if pg_autovacuum 
could work for say 3 minutes, and then back off for 2 minutes or so 
while the batch transactions hit, it would be ideal.

I'm not sure how to address all of these concerns, or that they all
should be addressed right now.  One of my big questions is backend
integration.  I am leaning towards leaving pg_autovacuum as a client
application in contrib for one more release.  During this time, I can
continue to tweak and improve pg_autovacuum so that we will have a very
good idea what the final product should be before we make it a standard
backend process.
I really think pg_autovacuum ought to get folded into the backend now, 
for 7.5. I haven't had time yet to read the entire thread, but I saw 
others making the same comment. It would make some of the listed 
problems go away, or at least become far easier to deal with.

For PostgreSQL 7.5,  I plan to implement these new features:

 1.Per database defaults and per table thresholds (including total
exclusion)
Great!

 2.Persistent data
 3.Single-Pass Mode (external scheduling from cron etc...)
 4.Off peak scheduling
Great again!

1. Per Database defaults and Per table Thresholds:

 1.Store config data inside a special pg_autovacuum table inside
existing databases that wants custom settings.   
A natural if folded into the backend.

3.Single-Pass Mode (External Scheduling):

I have received requests to be able to run pg_autovacuum only on request
(not as a daemon) making only one pass over all the tables (not looping
indefinately).  The advantage being that it will operate more like the
current vacuum command except that it will only vacuum tables that need
to be vacuumed.  This feature could be useful as long as pg_autovacuum
exists outside the backend.  If pg_autovacuum gets integrated into the
backend and gets automatically started as a daemon during startup, then
this option will no longer make sense.
It still might make sense. You could have a mode where the daemon 
essentially sleeps forever, until explicitly woken up by a signal. When 
woken, it makes one pass, and goes back to infinite sleep. Then provide 
a simple way to signal the autovacuum process -- maybe an extension of 
the current VACUUM syntax.

4.Off-Peak Scheduling:

A fundamental advantage of our vacuum system is that the work required
to reclaim table space is taken out of the critical path and can be
moved to and off-peak time when cycles are less precious.  One of the
drawbacks of the current pg_autovacuum is that it doesn't have any way
to factor this in.
In it's simplest form (which I will implement first) I would add the
ability to add a second set of thresholds that will be active only
during an off-peak time that can be specified in the pg_autovacuum
database, perhaps in a general_settings table.
I don't know how this would work, but it is for sure important. In the 
recent testing I found that pg_autovacuum (well, lazy vacuum in general, 
but I was using pg_autovacuum to control it) made a huge difference in 
performance of batch transactions. They range from 4-5 seconds without 
vacuum running, to as high as 15 minutes with vacuum running. With the 
vacuum delay patch, delay = 1, pagecount = 8, I still saw times go as 
high as 10 minutes. Backing vacuum off any more than that caused it to 
fall behind the transaction rate unrecoverably. But as I said above, if 
the transactions could complete without vacuum running in 4-5 seconds, 
then vacuuming resumes for the 3-to-4 minutes between batches, all would 
be well.

Joe

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Matthew T. O'Connor

Joe Conway wrote:

Matthew T. O'Connor wrote:

* Inability to schedule vacuums during off-peak times


This would be *really* nice to have. In my recent case, if 
pg_autovacuum could work for say 3 minutes, and then back off for 2 
minutes or so while the batch transactions hit, it would be ideal.

I'm not sure what you are suggesting here.  As it stands right now, 
pg_autovacuum just issues a standard vacuum command, so there isn't 
anything pg_autovacuum can do until that command completes.  There has 
been a lot of work going on trying to reduce performance impact  
associated with a vacuum (vacuum delay, ARC etc), hopefully that will 
make a big difference.

I really think pg_autovacuum ought to get folded into the backend now, 
for 7.5. I haven't had time yet to read the entire thread, but I saw 
others making the same comment. It would make some of the listed 
problems go away, or at least become far easier to deal with.

Yeah, that seems to be the consensus, I am going to work on that next.

3.Single-Pass Mode (External Scheduling):


It still might make sense. You could have a mode where the daemon 
essentially sleeps forever, until explicitly woken up by a signal. 
When woken, it makes one pass, and goes back to infinite sleep. Then 
provide a simple way to signal the autovacuum process -- maybe an 
extension of the current VACUUM syntax.

Well one thing we talked about was having the autovacuum process not be 
a daemon that is running all the time but rather a process that is 
spawned periodically by the postmaster, so you wouldn't have to worry 
about the signal / wake up issues.   I could see this working much like 
checkpoints where the postmaster fires them off on a schedule, there is 
nothing stopping you from issuing a checkpoint command to force one 
immediately.  So perhaps there could be a new SQL command like VACUUM 
AUTO or something like that.

4.Off-Peak Scheduling:

In it's simplest form (which I will implement first) I would add the
ability to add a second set of thresholds that will be active only
during an off-peak time that can be specified in the pg_autovacuum
database, perhaps in a general_settings table.


I don't know how this would work, but it is for sure important. In the 
recent testing I found that pg_autovacuum (well, lazy vacuum in 
general, but I was using pg_autovacuum to control it) made a huge 
difference in performance of batch transactions. They range from 4-5 
seconds without vacuum running, to as high as 15 minutes with vacuum 
running. With the vacuum delay patch, delay = 1, pagecount = 8, I 
still saw times go as high as 10 minutes. Backing vacuum off any more 
than that caused it to fall behind the transaction rate unrecoverably. 
But as I said above, if the transactions could complete without vacuum 
running in 4-5 seconds, then vacuuming resumes for the 3-to-4 minutes 
between batches, all would be well. 


Again, once the vacuum command is issued, it's out of pg_autovacuums 
control.  There has been some talk of pg_autovacuum looking at the 
system load to see if it should wait, or passing some different delay 
settings to vacuum based on system activity, so maybe some of that will 
help.

What I was talking about with Off-Peak scheduling is really just setting 
different thresholds for different times of the day, (or perhaps for 
different systems loads) so that if you know your app is typically busy 
from 8AM - 8PM,  then you can set more conservative thresholds during 
that time window,  and more aggressive thresholds during the Off-Peak time.

Thanks for the feedback,

Matthew

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Matthew T. O'Connor

Gavin Sherry wrote:

I was initially against the idea of using libpq but its growing on me too.

I think it would be good if the core functions of pg_autovacuum: threshold
algorithms, connection, issuing commands can be (re?)designed such that
not only the backend can link against it but also a stripped down
pg_autovacuum binary which can be used manually. That way users can have a
choice and a workaround if there are any issues with the backend model.
Also useful for debugging.
I agree.  Initially,  it appears that this won't be a problem since the 
pg_autovacuum executable will be untouched (or as untouched as possible 
anyway), it will just be launched by the backend.  Going forward if I 
use any of the functionality provided by the backend (error reporting 
etc...) I will then have to deign it to work in both stand alone and 
postmaster sub-process modes, which I think is doable.

Matthew

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Joe Conway

Matthew T. O'Connor wrote:
Could you please explain this better, I don't really understand what the 
problem is.  If you want pg_autovacuum to perform a vacuum on a table 
that has had exactly X updates no matter what, you can just run it with 
-V0 -vX (where X is the vacuum threshold)  same thing can be done for 
analyze.  Are you saying that you did this and it did not work as expected?
I think the problem with that (I initially tried it myself) is that 
without a per-table setting, it is impractical. For instance, I want the 
one large table in my app vacuumed every 500,000 dead tuples, but now my 
smaller tables never reach the threshold.

Joe

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Joe Conway

Matthew T. O'Connor wrote:
Joe Conway wrote:
This would be *really* nice to have. In my recent case, if 
pg_autovacuum could work for say 3 minutes, and then back off for 2 
minutes or so while the batch transactions hit, it would be ideal.

I'm not sure what you are suggesting here.  As it stands right now, 
pg_autovacuum just issues a standard vacuum command, so there isn't 
anything pg_autovacuum can do until that command completes.  There has 
been a lot of work going on trying to reduce performance impact  
associated with a vacuum (vacuum delay, ARC etc), hopefully that will 
make a big difference.
Hopefully it will. I've not had a chance yet to test against cvs tip.

I guess what I was suggesting would have to be built in to the lazy 
vacuum process itself, similar to the vacuum delay stuff. I have no idea 
if it would be practical, but ideally every X number of pages there 
would be a way to detect current load on the postgres cluster or 
current load on the server, and then either sleep, or go full bore, 
depending on the result and some threshold setting. Kind of like the 
vacuum delay patch, but with don't bother to sleep if the system is idle.

Joe

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] pg_autovacuum next steps

2004-03-22 Thread Alex J. Avriette

On Mon, Mar 22, 2004 at 04:50:57PM -0500, Matthew T. O'Connor wrote:

 Could you please explain this better, I don't really understand what the 
 problem is.  If you want pg_autovacuum to perform a vacuum on a table 
 that has had exactly X updates no matter what, you can just run it with 
 -V0 -vX (where X is the vacuum threshold)  same thing can be done for 
 analyze.  Are you saying that you did this and it did not work as expected?

My experience was it did not vacuum at all.

 Everyone agrees it should be in the backend eventually the questions 
 were just if it was ready, and how or if it should change when it goes in.

It is very simple code, I'm not sure what is not ready about it. If
people don't want to use it, they don't have to. Many databases can go
days to weeks without vacuuming.

alex

--
[EMAIL PROTECTED]
Alex J. Avriette, Unix Systems Gladiator
You cannot invade the mainland United States. There would be a rifle behind each 
blade of grass. - Admiral Isoroku Yamamoto 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] pg_autovacuum next steps

2004-03-21 Thread Tom Lane

Matthew T. O'Connor [EMAIL PROTECTED] writes:
 Lately I have been thinking about the next steps for the pg_autovacuum
 daemon.  I have written up a document that describes what I'm planning
 to do next.   Please read the attached and response as I would really
 like some feedback.

 [ rtf document ]

Please repost in some less proprietary format.  Plain text is generally
considered the thing to use on this list.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] pg_autovacuum next steps

2004-03-21 Thread Markus Bertheau

 , 22.03.2004,  00:12, Tom Lane :
 Please repost in some less proprietary format.  Plain text is generally
 considered the thing to use on this list.

-- 
Markus Bertheau [EMAIL PROTECTED]
pg_autovacuum Version 2
Design Document:


Exec Summary:
pg_autovacuum was initially released as a contrib module in PostgreSQL v7.4.   The 
version in 7.4 is by design very simple.  No configuration is required, and very 
little configuration is possible.  Despite these limitations it was voted the most 
popular new feature of PostgreSQL v7.4 according to the survey held on postgresql.org 
(http://www.postgresql.org/survey.php?View=1SurveyID=23).  

Despite it's popularity there is much room for improvement.  This document sets out to 
define the most important improvements that would help pg_autovacuum to become a truly 
powerful asset to the suite of tools that come with PostgreSQL.


Current Problems  Limitations:

Based on user feedback from people using pg_autovacuum in the field, and my own 
observations, there are a number of problems and limitation with pg_autovacuum.  They 
are:

Inability to customize thresholds on a per table basis
Inability to set default thresholds on a per database basis
Inability to exclude specific databases / tables from pg_autovacuum monitoring
Inability to schedule vacuums during off-peak times
Lack of integration related to startup and shutdown
Ignorance of VACUUM and ANALYZE operations performed outside pg_autovacuum (requires 
backend integration? or can listen / notify can be used?)
Lack of logging options / syslog integration / log rotation options
Create table fails because template1 is busy

I'm not sure how to address all of these concerns, or that they all should be 
addressed right now.  One of my big questions is backend integration.  I am leaning 
towards leaving pg_autovacuum as a client application in contrib for one more release. 
 During this time, I can continue to tweak and improve pg_autovacuum so that we will 
have a very good idea what the final product should be before we make it a standard 
backend process.


For PostgreSQL 7.5,  I plan to implement these new features:

1.Per database defaults and per table thresholds (including total exclusion)
2.Persistent data
3.Single-Pass Mode (external scheduling from cron etc...)
4.Off peak scheduling

1. Per Database defaults and Per table Thresholds:

There are differing opinions as to the best way to providing these this feature.  The 
primary debate is where to save the configuration data.  I see three options:

1.Store config data inside a special pg_autovacuum table inside existing databases 
that wants custom settings.   

2.Use a config file.  This would require some additional coding to add the required 
parsing, but is possible. 

3.Create a pg_autovacuum database inside any cluster that wants to customize their 
settings.  

Since many people do not like tools that clutter their databases by adding tables, I 
think option 1 (adding a pg_autovacuum table to existing databases) is right out.  
Using a config file would be Ok, but would require additional parsing code.  My 
preference is option 3.  Since pg_autovacuum will (hopefully) eventually become an 
integrated part of the backend, it will eventually be able to add required data to the 
system catalogs.  Given these two premises, as long as pg_autovacuum remains a contrib 
module it could use it's own database to mimic having system tables.  If this database 
exists, it will be used, if it does not exist, then pg_autovacuum will work just as it 
did in the 7.4 release with very limited options available to it.  The user will be 
able to specify a non-default database.

Table Structure for database specific defaults and table specific thresholds:

databases_defaults: (will reference the pg_class system table)
id  serial primary key
exclude_databaseboolean
default_vacuum_scaling_factor   float
default_vacuum_base_value   int
default_analyze_scaling_factor  float
default_analyze_base_value  int
dboid   oid references pg_database.oid

table_thresholds
id  serial primary key
exclude_table   boolean (exclude this table)
vacuum_scaling_factor   float   (equivalent to  -v)
vacuum_base_value   int (equivalent to -V)
vacuum_thresholdfloat   (if  0, use this threshold)
analyze_scaling_factor  float   (equivalent to -a)
analyze_base_value  int (equivalent to -A)
analyze_threshold   float   (if  0 use this threshold)
relid   oid references pg_classs.relid


2.Persistent pg_autovacuum Data:

Right now pg_autovacuum has no memory of what was going on the last

Re: [HACKERS] pg_autovacuum next steps

2004-03-21 Thread Peter Eisentraut

Matthew T. O'Connor wrote:
 Lately I have been thinking about the next steps for the
 pg_autovacuum daemon.  I have written up a document that describes
 what I'm planning to do next.   Please read the attached and response
 as I would really like some feedback.

I think these configuration issues will become a lot easier if you make 
the autovacuum daemon a subprocess of the postmaster (like, say, the 
checkpoint process).  Then you have access to a host of methods for 
storing state, handling configuration, etc.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] pg_autovacuum next steps

2004-03-21 Thread Christopher Kings-Lynne

I think these configuration issues will become a lot easier if you make 
the autovacuum daemon a subprocess of the postmaster (like, say, the 
checkpoint process).  Then you have access to a host of methods for 
storing state, handling configuration, etc.
Yeah - why delay making it a backend process? :)

Chris

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [HACKERS] pg_autovacuum next steps

2004-03-21 Thread Matthew T. O'Connor

On Sun, 2004-03-21 at 20:31, Christopher Kings-Lynne wrote:
  I think these configuration issues will become a lot easier if you make 
  the autovacuum daemon a subprocess of the postmaster (like, say, the 
  checkpoint process).  Then you have access to a host of methods for 
  storing state, handling configuration, etc.
 
 Yeah - why delay making it a backend process? :)

Ok, well this was part of the reason to have this conversation.

My reasons:
A) I wasn't sure if people really thought this was ready to be
integrated.  Tom had said a while ago, that it was a good to keep it as
a contrib module while it's still actively being developed.

B) Perhaps people like the idea of it being a client app (I don't think
so.)

C) Most importantly, I'm not backend hacker.  If someone wants to do the
initial work of getting it running as a backend process, I can take it
from there.  A while ago, Bruce offered to help me with any backend
issues I might have, so perhaps with a little help I can take a run at
it.

So the first question big question is: Do we want to make it a backend
subprocess now?

Secondly, are there any other features that people are interested in
that were not mentioned in my document?


Matthew O'Connor








---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] pg_autovacuum next steps

2004-03-21 Thread Matthew T. O'Connor

On Sun, 2004-03-21 at 18:12, Tom Lane wrote:
 Matthew T. O'Connor [EMAIL PROTECTED] writes:
  [ rtf document ]
 
 Please repost in some less proprietary format.  Plain text is generally
 considered the thing to use on this list.

I don't think RTF is proprietary but I should have just posted inline
anyway so here is a copy:

pg_autovacuum Version 2
Design Document:


Exec Summary:
pg_autovacuum was initially released as a contrib module in PostgreSQL
v7.4.   The version in 7.4 is by design very simple.  No configuration
is required, and very little configuration is possible.  Despite these
limitations it was voted the most popular new feature of PostgreSQL v7.4
according to the survey held on postgresql.org
(http://www.postgresql.org/survey.php?View=1SurveyID=23).  

Despite it's popularity there is much room for improvement.  This
document sets out to define the most important improvements that would
help pg_autovacuum to become a truly powerful asset to the suite of
tools that come with PostgreSQL.


Current Problems  Limitations:

Based on user feedback from people using pg_autovacuum in the field, and
my own observations, there are a number of problems and limitation with
pg_autovacuum.  They are:

* Inability to customize thresholds on a per table basis
* Inability to set default thresholds on a per database basis
* Inability to exclude specific databases / tables from pg_autovacuum
monitoring
* Inability to schedule vacuums during off-peak times
* Lack of integration related to startup and shutdown
* Ignorance of VACUUM and ANALYZE operations performed outside
pg_autovacuum (requires backend integration? or can listen / notify can
be used?)
* Lack of logging options / syslog integration / log rotation options
* Create table fails because template1 is busy

I'm not sure how to address all of these concerns, or that they all
should be addressed right now.  One of my big questions is backend
integration.  I am leaning towards leaving pg_autovacuum as a client
application in contrib for one more release.  During this time, I can
continue to tweak and improve pg_autovacuum so that we will have a very
good idea what the final product should be before we make it a standard
backend process.


For PostgreSQL 7.5,  I plan to implement these new features:

 1.Per database defaults and per table thresholds (including total
exclusion)
 2.Persistent data
 3.Single-Pass Mode (external scheduling from cron etc...)
 4.Off peak scheduling

1. Per Database defaults and Per table Thresholds:

There are differing opinions as to the best way to providing these this
feature.  The primary debate is where to save the configuration data.  I
see three options:

 1.Store config data inside a special pg_autovacuum table inside
existing databases that wants custom settings.   

 2.Use a config file.  This would require some additional coding to add
the required parsing, but is possible. 

 3.Create a pg_autovacuum database inside any cluster that wants to
customize their settings.  

Since many people do not like tools that clutter their databases by
adding tables, I think option 1 (adding a pg_autovacuum table to
existing databases) is right out.  Using a config file would be Ok, but
would require additional parsing code.  My preference is option 3. 
Since pg_autovacuum will (hopefully) eventually become an integrated
part of the backend, it will eventually be able to add required data to
the system catalogs.  Given these two premises, as long as pg_autovacuum
remains a contrib module it could use it's own database to mimic having
system tables.  If this database exists, it will be used, if it does not
exist, then pg_autovacuum will work just as it did in the 7.4 release
with very limited options available to it.  The user will be able to
specify a non-default database.

Table Structure for database specific defaults and table specific
thresholds:

databases_defaults: (will reference the pg_class system table)
id  serial primary key
exclude_databaseboolean
default_vacuum_scaling_factor   float
default_vacuum_base_value   int
default_analyze_scaling_factor  float
default_analyze_base_value  int
dboid   oid references pg_database.oid

table_thresholds
id  serial primary key
exclude_table   boolean (exclude this table)
vacuum_scaling_factor   float   (equivalent to  -v)
vacuum_base_value   int (equivalent to -V)
vacuum_thresholdfloat   (if  0, use this threshold)
analyze_scaling_factor  float   (equivalent to -a)
analyze_base_value  int (equivalent to -A)
analyze_threshold   float   (if  0 use this threshold)
relid   oid references pg_classs.relid


2.Persistent pg_autovacuum Data:

Right now pg_autovacuum has no memory of what

Re: [HACKERS] pg_autovacuum next steps

2004-03-21 Thread Gavin Sherry

On Sun, 21 Mar 2004, Matthew T. O'Connor wrote:

 On Sun, 2004-03-21 at 20:31, Christopher Kings-Lynne wrote:
   I think these configuration issues will become a lot easier if you make
   the autovacuum daemon a subprocess of the postmaster (like, say, the
   checkpoint process).  Then you have access to a host of methods for
   storing state, handling configuration, etc.
 
  Yeah - why delay making it a backend process? :)

 Ok, well this was part of the reason to have this conversation.

 My reasons:
 A) I wasn't sure if people really thought this was ready to be
 integrated.  Tom had said a while ago, that it was a good to keep it as
 a contrib module while it's still actively being developed.

I was talking to Jan about some other work on VACUUM related to more
intelligent vacuuming. Namely, maintaining a map (outside of shared
memory) of blocks which have been pushed out of the free space map for
VACUUM to visit (which requires a backend process) and being aware of load
restrictions (ie, allowing user to only vacuum when the load average is
less than X, for example) and some other leveling stuff to ensure that
availability is consistent. Whilst this doesn't related to pg_autovacuum
specifically, it'd be great if they could be released at the same time, I
think.


 B) Perhaps people like the idea of it being a client app (I don't think
 so.)


I'd like to see it as part of the backend.

 C) Most importantly, I'm not backend hacker.  If someone wants to do the
 initial work of getting it running as a backend process, I can take it
 from there.  A while ago, Bruce offered to help me with any backend
 issues I might have, so perhaps with a little help I can take a run at
 it.

I'd be happy to help you out.

Gavin

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] pg_autovacuum next steps

2004-03-21 Thread Bruce Momjian

  B) Perhaps people like the idea of it being a client app (I don't think
  so.)
 
 
 I'd like to see it as part of the backend.
 
  C) Most importantly, I'm not backend hacker.  If someone wants to do the
  initial work of getting it running as a backend process, I can take it
  from there.  A while ago, Bruce offered to help me with any backend
  issues I might have, so perhaps with a little help I can take a run at
  it.
 
 I'd be happy to help you out.

Agreed.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

Re: [HACKERS] pg_autovacuum next steps

34 matches

Site Navigation

Mail list logo

Footer information