Re: [HACKERS] Some ideas about Vacuum

Markus Schiltknecht Wed, 09 Jan 2008 04:25:09 -0800

Hi,

Gokulakannan Somasundaram wrote:

If we can ask the Vacuum process to scanthe WAL log, it can get all the relevant details on where it needs togo.

You seem to be assuming that only few tuples have changed betweenvacuums, so that WAL could quickly guide the VACUUM processes to theareas where cleaning is necessary.

Let's drop that assumption, because by default, autovacuum_scale_factoris 20%, so a VACUUM process normally kicks in after 20% of tupleschanged (disk space is cheap, I/O isn't). Additionally, there's adefault nap time of one minute - and VACUUM is forced to take at leastthat much of a nap.

So it's easily possible having more dead tuples, than live ones. In suchcases, scanning the WAL can easily takes *longer* than scanning thetable, because the amount of WAL to read would be bigger.

One main restriction it places on the WAL Logs is that the WAL Logneeds to be archived only after all the transactions in it completes. Inother words, WAL logs need to be given enough space, to survive thelongest transaction of the database. It is possible to avoid thissituation by asking the Vacuum process to take the necessary informationout of WAL log and store it somewhere and wait for the long runningtransaction to complete.


That would result in even more I/O...

The information of interest in WAL is only the tableinserts/updates/deletes. So if everyone accepts that this is a goodidea, till this point, there is a point in reading further.

Well, that's the information of interest, the question is where to storethat information. Maintaining a dead space map looks a lot cheaper tome, than relying on the WAL to store that information.

Ultimately, what has been achieved till now is that we have made thesequential scans made by the Vacuum process on each table into a fewrandom i/os. Of course there are optimizations possible to group therandom i/os and find some sequential i/o out of it. But still we need todo a full index scan for all those indexes out there. HOT might havesaved some work over there. But i am pessimistic here and wondering howit could have been improved. So it just strikes me, we can do the samething which we did just with the tables. Convert a seq scan of theentire table into a random scan of few blocks. We can read the necessarytuple information from the tuples, group them and hit at the index injust those blocks and clean it up.

Sorry, I don't quite get what you are talking about here. What doindexes have to do with dead space? Why not just keep acting on theblock level?

I can already hear people, saying that it is not always possible togo back to index from table. There is this culprit called unstablefunction based indexes.

No, there's no such thing. Citing [1]: "All functions and operators usedin an index definition must be "immutable", that is, their results mustdepend only on their arguments and never on any outside influence".

Of course, you can mark any function IMMUTABLE and get unstable functionbased indexes, but that turns into a giant foot gun very quickly.

P.S.: Let the objections/opposing views have a subtle reduction in itsharshness.

I'm just pointing at things that are in conflict with my knowledge,assumptions and believes, all which might be erroneous, plain wrong orcompletely mad. ;-)


Regards

Markus

[1]: the Very Fine Postgres Manual on CREATE INDEX:
http://www.postgresql.org/docs/8.3/static/sql-createindex.html

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [HACKERS] Some ideas about Vacuum

Reply via email to