I have added an auto-vacuum TODO item:
* Auto-vacuum
o Move into the backend code
o Scan the buffer cache to find free space or use background writer
o Use free-space map information to guide refilling
---------------------------------------------------------------------------
Russell Smith wrote:
> Hi All,
>
> I am doing serious thinking about the implementation of Auto Vacuum as part
> of the backend, Not using libpq, but classing internal functions directly.
> It appears to me that calling internal functions directly is a better
> implementation than using the external library to do the job.
>
> I know I might be stepping on Matthew's toes, but I don't really want to. I
> am a complete newbie to the postgresql code, however I am trying.
> Vacuum appears to be one of the bigger saw points with administrator having
> to configure it via scheduled tasks.
>
> I have outlined things I have thought about below. I've surely missed a lot,
> and am open to feedback. Others may like the current tuning used
> by the external autovacuum, however to have stats that are collected at
> vacuum, and used for later vacuum would mean we don't need the
> stats collector running.
>
> The major autovacuum issues
>
> 1. Transaction Wraparound
> 2. Vacuum of relations
> 3. Tracking of when to do vacuums
> 4. Where to store information needed by auto vacuum
>
> I would suggest the first step is to replicate the wraparound and relation
> vacuuming from the current code first. But I could be wrong.
> Currently there are a lot of tuning options in the external autovacuum, and
> the best way to run vacuum internally will need to be thought out.
>
> 1. Transaction Wraparound
>
> It appears from the code, that the best time to do a transaction wraparound
> db wide vacuum is when the
> frozenXID has wrapped, TransactionIdPrecedes(myXID, dbform->datfrozenxid).
> And is probably the most
> simple vacuum to implement.
>
>
> 2. Vacuuming of relations
>
> Currently, the entire heap must be vacuumed at one time. I would possible be
> desireable to have only part of the relation vacuumed at
> a time. If you can find out which parts of the relation have the most slack
> space. There is a todo item regarding tracking recent deletions
> so they can be resused. Some form of this would be helpful to work out what
> to vacuum. Performance issues for this type of activity
> may be a concern. But I have no experience to be able to make comment on
> them. So I welcome yours.
>
> 3. Tracking of when to vacuum
>
> Current autovacuum relies the stats collector to be running. I would like to
> only use the stats if they are available,
> and have an option to be able to vacuum accurately without having to have
> stats running.
>
> By adding certain histograms, on tuples, filesize and slack space we can
> guage the time between vacuums
>
> number of tuples will show the inserts as opposed to updates.
> file size will show that the file is growing and by how much between vacuums.
> slack space will show the delete/updated records.
>
> A new guc and relation option would need to be implemented to give a target
> slack space in a file.
> this will help to reduce growth in relations if vacuum happens to not run
> frequently enough. This information
> can also inform autovacuum that it should be vacuuming more frequently. The
> number would be a percentage,
> eg 10% of the total file size is allowed to be unused. Also alter table
> would allow users to set levels of slackness
> for each relation. If the table gets too much more than the target slack
> space, a "partial" vacuum full could be run
> to reduce the size of the table by moving tuples at the end of the table to
> slack space near the beginning and
> shortening the table length. It would require a full table lock, but you may
> be able to space it out, to only do a page
> at a time.
>
> /* target percentage of slack space */
> vacuum_default_target_slack_space = 10
>
> ALTER TABLE SET TARGET SLACK SPACE = 10;
>
> 4. Where to store information required by auto vacuum.
>
> Auto vacuum needs somewhere to stop the information it needs about current
> and future vacuuming. I am unsure of where
> to put this. It appears as if it will have a number of fields. I feel like
> pg_class is the best place to put the information, but
> again I have no idea.
>
> That's the best I can do for now. I can clarify things further if required.
>
> Regards
>
> Russell.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
>
--
Bruce Momjian | http://candle.pha.pa.us
[EMAIL PROTECTED] | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]