[GENERAL] trying to use CLUSTER

Sahagian, David Tue, 12 Feb 2013 14:39:55 -0800

Version=9.1.7

INFO: clustering "my_cool_table"  using sequential scan and sort
INFO: "my_cool_table": found 1 removable, 1699139 nonremovable row versions in 
49762 pages
Detail: 1689396 dead row versions cannot be removed yet.
CPU 9.80s/4.98u sec elapsed 175.92 sec.


INFO: clustering "my_cool_table"  using sequential scan and sort
INFO: "my_cool_table": found 7552 removable, 21732 nonremovable row versions in 
50007 pages
Detail: 11482 dead row versions cannot be removed yet.
CPU 0.01s/0.23u sec elapsed 36.29 sec.

INFO: clustering "my_cool_table"  using index scan on "pk_cool"
INFO: "my_cool_table": found 621462 removable, 36110 nonremovable row versions 
in 26135 pages
Detail: 25128 dead row versions cannot be removed yet.
CPU 0.02s/0.35u sec elapsed 0.79 sec.

So my_cool_table gets inserted into (but not updated) by regular processes 
doing their smallish CRUD transactions.

Concurrently, ONE process repeatedly "sweeps" a chunk of rows from the table 
every few seconds.
(ie, it does delete...returning, and then commits the sweep)
Note that if the table has not many rows, then all the rows will be swept 
together.

It is possible for something to go wrong resulting in:
  the table still being filled, but no longer being swept.

When the sweeping finally gets re-started, it must now chomp down a very large 
table.
When it finally sweeps down to near zero rows remaining, my idea was to do a 
CLUSTER on the table.
My expectation is that a VERY SMALL percentage of the row versions would 
actually get written to the new table!

My hope is that a smaller heap is better, now that the rate of sweeping is back 
to the rate of filling,
with the assumption that it will stay this way 99% of the time.


Can somebody tell me why some "dead row versions cannot be removed yet" ?
I assume that means CLUSTER must write them to the new table ?

It seems very costly to do the CLUSTER, if the new table is not really going to 
be a tiny fraction of the old table.
Is there a way for me to discover the approx number of "non-removables" BEFORE 
I do the CLUSTER ?
? Some pg_table query ? maybe after an analyze ?

Also, does the use of [index scan on "pk_cool"] basically depend on the ratio 
of removable/nonremovable row versions ?

Thanks,
-dvs-

[GENERAL] trying to use CLUSTER

Reply via email to