Re: [PERFORM] Index Bloat Problem

2012-08-18 Thread Greg Williamson
Thanks for this description--we have index bloat problems on a massively active 
(but small) database.This may help shed light on our problems.

Sorry for top-posting--challenged email reader.

Greg W.




 From: Jeff Janes jeff.ja...@gmail.com
To: Strahinja Kustudić strahin...@nordeus.com 
Cc: pgsql-performance@postgresql.org 
Sent: Friday, August 17, 2012 7:33 PM
Subject: Re: [PERFORM] Index Bloat Problem
 
On Thu, Aug 16, 2012 at 12:57 PM, Strahinja Kustudić
strahin...@nordeus.com wrote:

 @Jeff I'm not sure if I understand what you mean? I know that we never reuse
 key ranges. Could you be more clear, or give an example please.

If an index leaf page is completely empty because every entry on it
were deleted, it will get recycled to be used in some other part of
the index.  (Eventually--it can take a while, especially if you have
long-running transactions).

But if the leaf page is only mostly empty, because only most of
entries on it were deleted, than it can never be reused, except for
entries that naturally fall into its existing key range (which will
never happen, if you never reuse key ranges)

So if you have a million records with keys 1..100, and do a
delete from foo where key between 1 and 99, then 99% of those
old index pages will become completely empty and eligible for reuse.
But if you do delete from foo where key%1000, then all of the pages
will become 99% empty, and none will be eligible for reuse (except the
very last one, which can still accept 101 and so on)

There has  been talk of allowing logically adjacent, mostly empty
pages to be merged so that one of them becomes empty, but the way
concurrent access to btree indexes was designed this is extremely hard
to do safely.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




Re: [PERFORM] Index Bloat Problem

2012-08-17 Thread Jeff Janes
On Thu, Aug 16, 2012 at 12:57 PM, Strahinja Kustudić
strahin...@nordeus.com wrote:

 @Jeff I'm not sure if I understand what you mean? I know that we never reuse
 key ranges. Could you be more clear, or give an example please.

If an index leaf page is completely empty because every entry on it
were deleted, it will get recycled to be used in some other part of
the index.  (Eventually--it can take a while, especially if you have
long-running transactions).

But if the leaf page is only mostly empty, because only most of
entries on it were deleted, than it can never be reused, except for
entries that naturally fall into its existing key range (which will
never happen, if you never reuse key ranges)

So if you have a million records with keys 1..100, and do a
delete from foo where key between 1 and 99, then 99% of those
old index pages will become completely empty and eligible for reuse.
But if you do delete from foo where key%1000, then all of the pages
will become 99% empty, and none will be eligible for reuse (except the
very last one, which can still accept 101 and so on)

There has  been talk of allowing logically adjacent, mostly empty
pages to be merged so that one of them becomes empty, but the way
concurrent access to btree indexes was designed this is extremely hard
to do safely.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index Bloat Problem

2012-08-16 Thread Strahinja Kustudić
Thanks for the help everyone and sorry for not replying sooner, I was on
a business trip.

@Hubert pg_reorg looks really interesting and from the first read it looks
to be a very good solution for maintenance, but for now I would rather try
to slow down, or remove this bloat, so I have to do as less maintenance as
possible.

@Mark So basically I should decrease the autovacuum nap time from 60s to
10s, reduce the scale factor from 0.2 to 0.1. log_autovacuum_min_duration is
already set to 0, which means everything is logged.

@Jeff I'm not sure if I understand what you mean? I know that we never
reuse key ranges. Could you be more clear, or give an example please.

Thanks in advance,
Strahinja



On Tue, Aug 14, 2012 at 6:14 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Fri, Aug 10, 2012 at 3:15 PM, Strahinja Kustudić
 strahin...@nordeus.com wrote:
 
  For example, yesterday when I checked the database size on the production
  server it was 30GB, and the restored dump of that database was only 17GB.
  The most interesting thing is that the data wasn't bloated that much, but
  the indices were. Some of them were a few times bigger than they should
 be.
  For example an index on the production db is 440MB, while that same index
  after dump/restore is 17MB, and there are many indices with that high
  difference.

 Could your pattern of deletions be leaving sparsely populated, but not
 completely empty, index pages; which your insertions will then never
 reuse because they never again insert values in that key range?

 Cheers,

 Jeff



Re: [PERFORM] Index Bloat Problem

2012-08-13 Thread Mark Kirkwood

On 11/08/12 10:15, Strahinja Kustudić wrote:

We have PostgreSQL 9.1 running on Centos 5 on two SSDs, one for indices and
one for data. The database is extremely active with reads and writes. We
have autovacuum enabled, but we didn't tweak it's aggressiveness. The
problem is that after some time the database grows even more than 100% on
the file system and most of the growth is because the indices are a few
times bigger than they should be, and when this happens, the performance of
the DB drops.

For example, yesterday when I checked the database size on the production
server it was 30GB, and the restored dump of that database was only 17GB.
The most interesting thing is that the data wasn't bloated that much, but
the indices were. Some of them were a few times bigger than they should be.
For example an index on the production db is 440MB, while that same index
after dump/restore is 17MB, and there are many indices with that high
difference. We could fix the problem if we reindex the DB, but that makes
our DB go offline and it's not possible to do in the production enviroment.

Is there a way to make the autovacuum daemon more aggressive, since I'm not
exactly sure how to do that in this case? Would that even help? Is there
another way to remove this index bloat?




Some workloads can be difficult to tame. However I would try something 
like this in postgresql.conf:


autovacuum_naptime= 10s
autovacuum_vacuum_scale_factor = 0.1

and maybe set log_autovacuum_min_duration so you see what autovacuum is 
doing.


If the above settings don't help, then you could maybe monitor growth 
and schedule regular REINDEXes on the tables concerned (at some suitably 
quiet time).


Regards

Mark






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index Bloat Problem

2012-08-13 Thread Jeff Janes
On Fri, Aug 10, 2012 at 3:15 PM, Strahinja Kustudić
strahin...@nordeus.com wrote:

 For example, yesterday when I checked the database size on the production
 server it was 30GB, and the restored dump of that database was only 17GB.
 The most interesting thing is that the data wasn't bloated that much, but
 the indices were. Some of them were a few times bigger than they should be.
 For example an index on the production db is 440MB, while that same index
 after dump/restore is 17MB, and there are many indices with that high
 difference.

Could your pattern of deletions be leaving sparsely populated, but not
completely empty, index pages; which your insertions will then never
reuse because they never again insert values in that key range?

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index Bloat Problem

2012-08-11 Thread hubert depesz lubaczewski
On Sat, Aug 11, 2012 at 12:15:11AM +0200, Strahinja Kustudić wrote:
 Is there a way to make the autovacuum daemon more aggressive, since I'm not
 exactly sure how to do that in this case? Would that even help? Is there
 another way to remove this index bloat?

http://www.depesz.com/index.php/2011/07/06/bloat-happens/

Best regards,

depesz

-- 
The best thing about modern society is how easy it is to avoid contact with it.
 http://depesz.com/

-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index bloat problem?

2005-04-22 Thread Tom Lane
Bill Chandler [EMAIL PROTECTED] writes:
 Client is reporting that the size of an index is
 greater than the number of rows in the table (1.9
 million vs. 1.5 million).

This thread seems to have wandered away without asking the critical
question what did you mean by that?

It's not possible for an index to have more rows than there are in
the table unless something is seriously broken.  And there aren't
any SQL operations that let you inspect an index directly anyway.
So: what is the actual observation that led you to the above
conclusion?  Facts, please, not inferences.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Index bloat problem?

2005-04-22 Thread Tom Lane
David Roussel [EMAIL PROTECTED] writes:
 |dave_data_update_eventsr   1593600.0 40209
 |dave_data_update_events_event_id_key   i   1912320.0 29271

Hmm ... what PG version is this, and what does VACUUM VERBOSE on
that table show?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Index bloat problem?

2005-04-22 Thread David Roussel
On Fri, 22 Apr 2005 10:06:33 -0400, Tom Lane [EMAIL PROTECTED] said:
 David Roussel [EMAIL PROTECTED] writes:
  |dave_data_update_eventsr   1593600.0 40209
  |dave_data_update_events_event_id_key   i   1912320.0 29271
 
 Hmm ... what PG version is this, and what does VACUUM VERBOSE on
 that table show?

PG 7.4

The disparity seems to have sorted itself out now, so hampering futher
investigations. I guess the regular inserts of new data, and the nightly
deletion and index recreation did it.  However, we did suffer reduced
performance and the strange cardinality for several days before it went 
away.  For what it's worth..

ndb=#  vacuum verbose iso_pjm_data_update_events;
INFO:  vacuuming public.iso_pjm_data_update_events
INFO:  index iso_pjm_data_update_events_event_id_key now contains
1912320 row versions in 29271 pages
DETAIL:  21969 index pages have been deleted, 2 are currently
reusable.
CPU 6.17s/0.88u sec elapsed 32.55 sec.
INFO:  index iso_pjm_data_update_events_lds_idx now contains 1912320
row versions in 7366 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 3.52s/0.57u sec elapsed 14.35 sec.
INFO:  index iso_pjm_data_update_events_obj_id_idx now contains
1912320 row versions in 7366 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 3.57s/0.58u sec elapsed 12.87 sec.
INFO:  iso_pjm_data_update_events: found 0 removable, 1912320
nonremovable row versions in 40209 pages
DETAIL:  159384 dead row versions cannot be removed yet.
There were 745191 unused item pointers.
0 pages are entirely empty.
CPU 18.26s/3.62u sec elapsed 74.35 sec.
VACUUM

After each insert is does this...

VACUUM ANALYZE iso_pjm_DATA_UPDATE_EVENTS
VACUUM ANALYZE iso_pjm_CONTROL

Each night it does this...

BEGIN
DROP INDEX iso_pjm_control_obj_id_idx
DROP INDEX iso_pjm_control_real_name_idx
DROP INDEX iso_pjm_data_update_events_lds_idx
DROP INDEX iso_pjm_data_update_events_obj_id_idx
CREATE UNIQUE INDEX iso_pjm_control_obj_id_idx ON
iso_pjm_control(obj_id)
CLUSTER iso_pjm_control_obj_id_idx ON iso_pjm_control
CREATE UNIQUE INDEX iso_pjm_control_real_name_idx ON
iso_pjm_control(real_name)
CREATE INDEX iso_pjm_data_update_events_lds_idx ON
iso_pjm_data_update_events(lds)
CREATE INDEX iso_pjm_data_update_events_obj_id_idx ON
iso_pjm_data_update_events(obj_id)
COMMIT

Note there is no reference to iso_pjm_data_update_events_event_id_key
which is the index that went wacky on us.  Does that seem weird to you?

Thanks

David

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Index bloat problem?

2005-04-22 Thread Jim C. Nasby
You would be interested in
http://archives.postgresql.org/pgsql-hackers/2005-04/msg00565.php

On Thu, Apr 21, 2005 at 03:33:05PM -0400, Dave Chapeskie wrote:
 On Thu, Apr 21, 2005 at 11:28:43AM -0700, Josh Berkus wrote:
  Michael,
  
   Every five minutes, DBCC INDEXDEFRAG will report to the user an
   estimated percentage completed. DBCC INDEXDEFRAG can be terminated at
   any point in the process, and *any completed work is retained.*
  
  Keen.  Sounds like something for our TODO list.
  
  -- 
  Josh Berkus
  Aglio Database Solutions
  San Francisco
 
 See http://archives.postgresql.org/pgsql-general/2005-03/msg01465.php
 for my thoughts on a non-blocking alternative to REINDEX.  I got no
 replies to that message. :-(
 
 
 I've almost got a working solution integrated in the backend that does
 correct WAL logging and everything.  (Writing the code to write and
 replay WAL logs for complicated operations can be very annoying!)
 
 For now I've gone with a syntax of:
 
   REINDEX INDEX btree_index_name INCREMENTAL;
 
 (For now it's not a proper index AM (accessor method), instead the
 generic index code knows this is only supported for btrees and directly
 calls the btree_compress function.)
 
 It's not actually a REINDEX per-se in that it doesn't rebuild the whole
 index.  It holds brief exclusive locks on the index while it shuffles
 items around to pack the leaf pages fuller.  There were issues with the
 code I attached to the above message that have been resolved with the
 new code.  With respect to the numbers provided in that e-mail the new
 code also recycles more pages than before.
 
 Once I've finished it up I'll prepare and post a patch.
 
 -- 
 Dave Chapeskie
 OpenPGP Key ID: 0x3D2B6B34
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Index bloat problem?

2005-04-22 Thread Tom Lane
David Roussel [EMAIL PROTECTED] writes:
 Note there is no reference to iso_pjm_data_update_events_event_id_key
 which is the index that went wacky on us.  Does that seem weird to you?

What that says is that that index doesn't belong to that table.  You
sure it wasn't a chance coincidence of names that made you think it did?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Index bloat problem?

2005-04-22 Thread a3a18850
Quoting Bill Chandler [EMAIL PROTECTED]:

 Running PostgreSQL 7.4.2, Solaris.
 Client is reporting that the size of an index is
 greater than the number of rows in the table (1.9
 million vs. 1.5 million).  Index was automatically
 created from a 'bigserial unique' column.

 We have been running 'VACUUM ANALYZE' very regularly. 
 In fact, our vacuum schedule has probably been
 overkill.  We have been running on a per-table basis
 after every update (many per day, only inserts
 occurring) and after every purge (one per day,
 deleting a day's worth of data).  
 
 What about if an out-of-the-ordinary number of rows
 were deleted (say 75% of rows in the table, as opposed
 to normal 5%) followed by a 'VACUUM ANALYZE'?  Could
 things get out of whack because of that situation?

I gather you mean, out-of-the-ordinary for most apps, but not for this client?

In case nobody else has asked: is your max_fsm_pages big enough to handle all
the deleted pages, across ALL tables hit by the purge? If not, you're
haemorrhaging pages, and VACUUM is probably warning you about exactly that.

If that's not a problem, you might want to consider partitioning the data.
Take a look at inherited tables. For me, they're a good approximation of
clustered indexes (sigh, miss'em) and equivalent to table spaces.

My app is in a similar boat to yours: up to 1/3 of a 10M-row table goes away
every day. For each of the child tables that is a candidate to be dropped, there
is a big prologue txn, whichs moves (INSERT then DELETE) the good rows into a
child table that is NOT to be dropped. Then BANG pull the plug on the tables you
don't want. MUCH faster than DELETE: the dropped tables' files' disk space goes
away in one shot, too.

Just my 2c.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] Index bloat problem?

2005-04-21 Thread Bill Chandler
All,

Running PostgreSQL 7.4.2, Solaris.

Client is reporting that the size of an index is
greater than the number of rows in the table (1.9
million vs. 1.5 million).  Index was automatically
created from a 'bigserial unique' column.

Database contains several tables with exactly the same
columns (including 'bigserial unique' column).  This
is the only table where this index is out of line with
the actual # of rows.  

Queries on this table take 40 seconds to retrieve 2000
rows as opposed to 1-2 seconds on the other tables.

We have been running 'VACUUM ANALYZE' very regularly. 
In fact, our vacuum schedule has probably been
overkill.  We have been running on a per-table basis
after every update (many per day, only inserts
occurring) and after every purge (one per day,
deleting a day's worth of data).  

It is theoretically possible that at some time a
process was run that deleted all rows in the table
followed by a VACUUM FULL.  In this case we would have
dropped/recreated our own indexes on the table but not
the index automatically created for the bigserial
column.  If that happened, could that cause these
symptoms?

What about if an out-of-the-ordinary number of rows
were deleted (say 75% of rows in the table, as opposed
to normal 5%) followed by a 'VACUUM ANALYZE'?  Could
things get out of whack because of that situation?

thanks,

Bill

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Josh Berkus
Bill,

 What about if an out-of-the-ordinary number of rows
 were deleted (say 75% of rows in the table, as opposed
 to normal 5%) followed by a 'VACUUM ANALYZE'?  Could
 things get out of whack because of that situation?

Yes.  You'd want to run REINDEX after and event like that.  As you should now.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Bill Chandler

--- Josh Berkus josh@agliodbs.com wrote:
 Bill,
 
  What about if an out-of-the-ordinary number of
 rows
  were deleted (say 75% of rows in the table, as
 opposed
  to normal 5%) followed by a 'VACUUM ANALYZE'?
  Could
  things get out of whack because of that situation?
 
 Yes.  You'd want to run REINDEX after and event like
 that.  As you should now.
 
 -- 
 Josh Berkus
 Aglio Database Solutions
 San Francisco
 

Thank you.  Though I must say, that is very
discouraging.  REINDEX is a costly operation, timewise
and due to the fact that it locks out other processes
from proceeding.  Updates are constantly coming in and
queries are occurring continuously.  A REINDEX could
potentially bring the whole thing to a halt.

Honestly, this seems like an inordinate amount of
babysitting for a production application.  I'm not
sure if the client will be willing to accept it.  

Admittedly my knowledge of the inner workings of an
RDBMS is limited, but could somebody explain to me why
this would be so?  If you delete a bunch of rows why
doesn't the index get updated at the same time?  Is
this a common issue among all RDBMSs or is it
something that is PostgreSQL specific?  Is there any
way around it?

thanks,

Bill

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Josh Berkus
Bill,

 Honestly, this seems like an inordinate amount of
 babysitting for a production application.  I'm not
 sure if the client will be willing to accept it.

Well, then, tell them not to delete 75% of the rows in a table at once.  I 
imagine that operation brought processing to a halt, too.

If the client isn't willing to accept the consequences of their own bad data 
management, I'm not really sure what you expect us to do about it.

 Admittedly my knowledge of the inner workings of an
 RDBMS is limited, but could somebody explain to me why
 this would be so?  If you delete a bunch of rows why
 doesn't the index get updated at the same time?  

It does get updated.  What doesn't happen is the space getting reclaimed.  In 
a *normal* data situation, the dead nodes are recycled for new rows.   But 
doing a massive delete operation upsets that, and generally needs to be 
followed by a REINDEX.

 Is 
 this a common issue among all RDBMSs or is it
 something that is PostgreSQL specific?  

Speaking from experience, this sort of thing affects MSSQL as well, although 
the maintenance routines are different.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Josh Berkus
Bill,

 Honestly, this seems like an inordinate amount of
 babysitting for a production application.  I'm not
 sure if the client will be willing to accept it.

Well, then, tell them not to delete 75% of the rows in a table at once.  I 
imagine that operation brought processing to a halt, too.

 Admittedly my knowledge of the inner workings of an
 RDBMS is limited, but could somebody explain to me why
 this would be so?  If you delete a bunch of rows why
 doesn't the index get updated at the same time?  

It does get updated.  What doesn't happen is the space getting reclaimed.  In 
a *normal* data situation, those dead nodes would be replaced with new index 
nodes.   However, a mass-delete-in-one-go messes that system up.

 Is 
 this a common issue among all RDBMSs or is it
 something that is PostgreSQL specific?  

Speaking from experience, this sort of thing affects MSSQL as well, although 
the maintenance routines are different.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Alex Turner
Is:

REINDEX DATABASE blah

supposed to rebuild all indices in the database, or must you specify
each table individualy? (I'm asking because I just tried it and it
only did system tables)

Alex Turner
netEconomist

On 4/21/05, Josh Berkus josh@agliodbs.com wrote:
 Bill,
 
  What about if an out-of-the-ordinary number of rows
  were deleted (say 75% of rows in the table, as opposed
  to normal 5%) followed by a 'VACUUM ANALYZE'? Could
  things get out of whack because of that situation?
 
 Yes.  You'd want to run REINDEX after and event like that.  As you should now.
 
 --
 Josh Berkus
 Aglio Database Solutions
 San Francisco
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
http://archives.postgresql.org


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Josh Berkus
Alex,

 REINDEX DATABASE blah

 supposed to rebuild all indices in the database, or must you specify
 each table individualy? (I'm asking because I just tried it and it
 only did system tables)

DATABASE

 Recreate all system indexes of a specified database. Indexes on user tables 
are not processed. Also, indexes on shared system catalogs are skipped except 
in stand-alone mode (see below). 

http://www.postgresql.org/docs/8.0/static/sql-reindex.html

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Chris Browne
josh@agliodbs.com (Josh Berkus) writes:
 Bill,

 What about if an out-of-the-ordinary number of rows
 were deleted (say 75% of rows in the table, as opposed
 to normal 5%) followed by a 'VACUUM ANALYZE'?  Could
 things get out of whack because of that situation?

 Yes.  You'd want to run REINDEX after and event like that.  As you should now.

Based on Tom's recent comments, I'd be inclined to handle this via
doing a CLUSTER, which has the triple heroism effect of:

 a) Reorganizing the entire table to conform with the relevant index order,
 b) Having the effect of VACUUM FULL, and
 c) Having the effect of REINDEX

all in one command.

It has all of the oops, that blocked me for 20 minutes effect of
REINDEX and VACUUM FULL, but at least it doesn't have the effect
twice...
-- 
(format nil [EMAIL PROTECTED] cbbrowne acm.org)
http://www.ntlug.org/~cbbrowne/sap.html
Rules of the Evil Overlord #78.  I will not tell my Legions of Terror
And he must  be taken alive! The command will be:  ``And try to take
him alive if it is reasonably practical.''
http://www.eviloverlord.com/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Michael Guerin

Is 
this a common issue among all RDBMSs or is it
something that is PostgreSQL specific?  
   

Speaking from experience, this sort of thing affects MSSQL as well, although 
the maintenance routines are different.

 

Yes, this is true with MSSQL too, however sql server implements a defrag 
index that doesn't lock up the table..

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_dbcc_30o9.asp
DBCC INDEXDEFRAG can defragment clustered and nonclustered indexes on 
tables and views. DBCC INDEXDEFRAG defragments the leaf level of an 
index so that the physical order of the pages matches the left-to-right 
logical order of the leaf nodes, thus improving index-scanning performance.

Every five minutes, DBCC INDEXDEFRAG will report to the user an 
estimated percentage completed. DBCC INDEXDEFRAG can be terminated at 
any point in the process, and *any completed work is retained.*

-michael
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Josh Berkus
Michael,

 Every five minutes, DBCC INDEXDEFRAG will report to the user an
 estimated percentage completed. DBCC INDEXDEFRAG can be terminated at
 any point in the process, and *any completed work is retained.*

Keen.  Sounds like something for our TODO list.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Bill Chandler
--- [EMAIL PROTECTED] wrote:
 I gather you mean, out-of-the-ordinary for most
 apps, but not for this client?

Actually, no.  The normal activity is to delete 3-5%
of the rows per day, followed by a VACUUM ANALYZE. 
Then over the course of the day (in multiple
transactions) about the same amount are INSERTed (each
transaction followed by a VACUUM ANALYZE on just the
updated table).  So 75% deletion is just out of the
ordinary for this app.  However, on occasion, deleting
75% of rows is a legitimate action for the client to
take.  It would be nice if they didn't have to
remember to do things like REINDEX or CLUSTER or
whatever on just those occasions.
 
 In case nobody else has asked: is your max_fsm_pages
 big enough to handle all
 the deleted pages, across ALL tables hit by the
 purge? If not, you're
 haemorrhaging pages, and VACUUM is probably warning
 you about exactly that.

This parameter is most likely set incorrectly.  So
that could be causing problems.  Could that be a
culprit for the index bloat, though?

 If that's not a problem, you might want to consider
 partitioning the data.
 Take a look at inherited tables. For me, they're a
 good approximation of
 clustered indexes (sigh, miss'em) and equivalent to
 table spaces.
 
 My app is in a similar boat to yours: up to 1/3 of a
 10M-row table goes away
 every day. For each of the child tables that is a
 candidate to be dropped, there
 is a big prologue txn, whichs moves (INSERT then
 DELETE) the good rows into a
 child table that is NOT to be dropped. Then BANG
 pull the plug on the tables you
 don't want. MUCH faster than DELETE: the dropped
 tables' files' disk space goes
 away in one shot, too.
 
 Just my 2c.

Thanks.

Bill 

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Alex Turner
Same thing happens in Oracle

ALTER INDEX blah rebuild

To force a rebuild.  It will mark the free blocks as 'free' below the
PCTFREE value for the tablespace.

Basically If you build an index with  entries.  and each entry is
1/4 of a block, the database will write 2500 blocks to the disk.  If
you delete a random 75% of the index values, you will now have 2500
blocks that have 75% free space.  The database will reuse that free
space in those blocks as you insert new values, but until then, you
still have 2500 blocks worth of data on a disk, that is only 25% full.
 Rebuilding the index forces the system to physically re-allocate all
that data space, and now you have just 2499 entries, that use 625
blocks.

I'm not sure that 'blocks' is the correct term in postgres, it's
segments in Oracle, but the concept remains the same.

Alex Turner
netEconomist

On 4/21/05, Bill Chandler [EMAIL PROTECTED] wrote:
 
 --- Josh Berkus josh@agliodbs.com wrote:
  Bill,
 
   What about if an out-of-the-ordinary number of
  rows
   were deleted (say 75% of rows in the table, as
  opposed
   to normal 5%) followed by a 'VACUUM ANALYZE'?
  Could
   things get out of whack because of that situation?
 
  Yes.  You'd want to run REINDEX after and event like
  that.  As you should now.
 
  --
  Josh Berkus
  Aglio Database Solutions
  San Francisco
 
 
 Thank you.  Though I must say, that is very
 discouraging.  REINDEX is a costly operation, timewise
 and due to the fact that it locks out other processes
 from proceeding.  Updates are constantly coming in and
 queries are occurring continuously.  A REINDEX could
 potentially bring the whole thing to a halt.
 
 Honestly, this seems like an inordinate amount of
 babysitting for a production application.  I'm not
 sure if the client will be willing to accept it.
 
 Admittedly my knowledge of the inner workings of an
 RDBMS is limited, but could somebody explain to me why
 this would be so?  If you delete a bunch of rows why
 doesn't the index get updated at the same time?  Is
 this a common issue among all RDBMSs or is it
 something that is PostgreSQL specific?  Is there any
 way around it?
 
 thanks,
 
 Bill
 
 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around
 http://mail.yahoo.com
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
http://archives.postgresql.org


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Dave Chapeskie
On Thu, Apr 21, 2005 at 11:28:43AM -0700, Josh Berkus wrote:
 Michael,
 
  Every five minutes, DBCC INDEXDEFRAG will report to the user an
  estimated percentage completed. DBCC INDEXDEFRAG can be terminated at
  any point in the process, and *any completed work is retained.*
 
 Keen.  Sounds like something for our TODO list.
 
 -- 
 Josh Berkus
 Aglio Database Solutions
 San Francisco

See http://archives.postgresql.org/pgsql-general/2005-03/msg01465.php
for my thoughts on a non-blocking alternative to REINDEX.  I got no
replies to that message. :-(


I've almost got a working solution integrated in the backend that does
correct WAL logging and everything.  (Writing the code to write and
replay WAL logs for complicated operations can be very annoying!)

For now I've gone with a syntax of:

  REINDEX INDEX btree_index_name INCREMENTAL;

(For now it's not a proper index AM (accessor method), instead the
generic index code knows this is only supported for btrees and directly
calls the btree_compress function.)

It's not actually a REINDEX per-se in that it doesn't rebuild the whole
index.  It holds brief exclusive locks on the index while it shuffles
items around to pack the leaf pages fuller.  There were issues with the
code I attached to the above message that have been resolved with the
new code.  With respect to the numbers provided in that e-mail the new
code also recycles more pages than before.

Once I've finished it up I'll prepare and post a patch.

-- 
Dave Chapeskie
OpenPGP Key ID: 0x3D2B6B34

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Mischa Sandberg
Quoting Bill Chandler [EMAIL PROTECTED]:

 ... The normal activity is to delete 3-5% of the rows per day,
 followed by a VACUUM ANALYZE. 
...
 However, on occasion, deleting 75% of rows is a 
 legitimate action for the client to take.  

  In case nobody else has asked: is your max_fsm_pages
  big enough to handle all the deleted pages, 
  across ALL tables hit by the purge?

 This parameter is most likely set incorrectly.  So
 that could be causing problems.  Could that be a
 culprit for the index bloat, though?

Look at the last few lines of vacuum verbose output.
It will say something like:

free space map: 55 relations, 88416 pages stored; 89184 total pages needed
  Allocated FSM size: 1000 relations + 100 pages = 5920 kB shared memory.

100 here is [max_fsm_pages] from my postgresql.conf.
If the total pages needed is bigger than the pages 
fsm is allocated for, then you are bleeding.
-- 
Dreams come true, not free. -- S.Sondheim, ITW


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Bill Chandler
Mischa,

Thanks.  Yes, I understand that not having a large
enough max_fsm_pages is a problem and I think that it
is most likely the case for the client.  What I wasn't
sure of was if the index bloat we're seeing is the
result of the bleeding you're talking about or
something else.

If I deleted 75% of the rows but had a max_fsm_pages
setting that still exceeded the pages required (as
indicated in VACUUM output), would that solve my
indexing problem or would I still need to REINDEX
after such a purge?

regards,

Bill

--- Mischa Sandberg [EMAIL PROTECTED] wrote:
 Quoting Bill Chandler [EMAIL PROTECTED]:
 
  ... The normal activity is to delete 3-5% of the
 rows per day,
  followed by a VACUUM ANALYZE. 
 ...
  However, on occasion, deleting 75% of rows is a 
  legitimate action for the client to take.  
 
   In case nobody else has asked: is your
 max_fsm_pages
   big enough to handle all the deleted pages, 
   across ALL tables hit by the purge?
 
  This parameter is most likely set incorrectly.  So
  that could be causing problems.  Could that be a
  culprit for the index bloat, though?
 
 Look at the last few lines of vacuum verbose output.
 It will say something like:
 
 free space map: 55 relations, 88416 pages stored;
 89184 total pages needed
   Allocated FSM size: 1000 relations + 100 pages
 = 5920 kB shared memory.
 
 100 here is [max_fsm_pages] from my
 postgresql.conf.
 If the total pages needed is bigger than the pages
 
 fsm is allocated for, then you are bleeding.
 -- 
 Dreams come true, not free. -- S.Sondheim, ITW
 
 

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Josh Berkus
Bill,

 If I deleted 75% of the rows but had a max_fsm_pages
 setting that still exceeded the pages required (as
 indicated in VACUUM output), would that solve my
 indexing problem or would I still need to REINDEX
 after such a purge?

Depends on the performance you're expecting.The FSM relates the the re-use 
of nodes, not taking up free space.   So after you've deleted 75% of rows, 
the index wouldn't shrink.  It just wouldn't grow when you start adding rows.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread John A Meinel
Bill Chandler wrote:
Mischa,
Thanks.  Yes, I understand that not having a large
enough max_fsm_pages is a problem and I think that it
is most likely the case for the client.  What I wasn't
sure of was if the index bloat we're seeing is the
result of the bleeding you're talking about or
something else.
If I deleted 75% of the rows but had a max_fsm_pages
setting that still exceeded the pages required (as
indicated in VACUUM output), would that solve my
indexing problem or would I still need to REINDEX
after such a purge?
regards,
Bill

I don't believe VACUUM re-packs indexes. It just removes empty index
pages. So if you have 1000 index pages all with 1 entry in them, vacuum
cannot reclaim any pages. REINDEX re-packs the pages to 90% full.
fsm just needs to hold enough pages that all requests have free space
that can be used before your next vacuum. It is just a map letting
postgres know where space is available for a new fill.
John
=:-


signature.asc
Description: OpenPGP digital signature