Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-24 Thread Vivek Khera


On Sep 18, 2007, at 1:14 AM, Joshua D. Drake wrote:


Q: To get rid of index bloat, is a VACUUM ANALYZE enough? Or must I
reindex/cluster indexes?


If you overrun your max_fsm_pages, no:
  else yes;


my algorithm is: if (true) then yes;

my FSM is way bigger than I ever use (vacuum never reports shortage)  
and I still get bloat that needs to be purged out with a reindex on  
occasion.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-24 Thread Phoenix Kiula
On 24/09/2007, Vivek Khera [EMAIL PROTECTED] wrote:

 my FSM is way bigger than I ever use (vacuum never reports shortage)
 and I still get bloat that needs to be purged out with a reindex on
 occasion.



Vivek,

I feel your pain. But I seem to have (mostly) solved my problem in three ways:

1. Increase the shared_buffer and effective_cache_size settings in
postgresql.conf. There are some websites that suggest that increasing
shared_buffer beyond 40,000 may in fact have counter-intuitive
results, but not in my case. I'm at 60,000 and it seems to work well.
Effective_cache_size is 512000.

2. Reduce the fill factor on your table. This is the single most
performance boost. On a table that is frequently updated on a TEXT
column, I reduced it to 60 and have never looked back. For others, I'm
experimenting with 80 and it seems to be working well.

3. Make your autovacuum settings as aggressive as can be. Basically I
found that doing a cronjob of vacuuming every five hours worked really
well, which suggested that autovacuum was not really kicking in as
often it was needed. So I reduced the threshold (100 for vacuum, 80
for analyze...i.e., the number of tuples that get updated before
either process kicks in) and reduced quite aggressively the scores.
Here are my settings:

autovacuum   = on
autovacuum_vacuum_cost_delay = 10
vacuum_cost_delay= 10
autovacuum_naptime   = 10
autovacuum_vacuum_threshold  = 75
autovacuum_analyze_threshold = 25
autovacuum_analyze_scale_factor  = 0.02
autovacuum_vacuum_scale_factor   = 0.01


Hope this helps some.

PK.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-24 Thread Vivek Khera


On Sep 24, 2007, at 12:00 PM, Phoenix Kiula wrote:

I feel your pain. But I seem to have (mostly) solved my problem in  
three ways:


My particular usage pattern (add data continuously, purge *some* of  
the data once per week or every other week.  The purge is what seems  
to kill it.  Last time I reindexed, I reclaimed over 20Gb of disk  
space.  That was after 6 months from the prior reindex.


Recommending I run vacuum intermixed with the data purge is a non- 
starter; the vacuum on these tables takes a couple of hours.  I'd  
never finish purging my data with that kind of delay.


Recommending splitting my tables with inheritance is not going to  
work since my purges don't happen across any lines which would make  
sense for splitting with inheritance.


I will investigate the fill-factor.  That seems like it may make some  
sense the way I do inserts and updates...



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-24 Thread Gregory Stark

Vivek Khera [EMAIL PROTECTED] writes:

 On Sep 24, 2007, at 12:00 PM, Phoenix Kiula wrote:

 I feel your pain. But I seem to have (mostly) solved my problem in  three
 ways:

 My particular usage pattern (add data continuously, purge *some* of  the data
 once per week or every other week.  The purge is what seems  to kill it.  Last
 time I reindexed, I reclaimed over 20Gb of disk  space.  That was after 6
 months from the prior reindex.

Do you have a pattern of loading a ton of data covering a range of indexed key
values and then deleting all but a few values spread evenly throughout that
range? And then never inserting new key values in those ranges again? For
example loading records indexed by timestamp and then deleting all but the
first record of the day.

That kind of pattern does need a regular reindex because the index pages will
have those few values left on them preventing them from being reused.

If that's not your usage pattern then perhaps you should describe your usage
pattern in more detail. But I suspect you would be best served by simply
vacuuming much more often.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-24 Thread Phoenix Kiula
On 25/09/2007, Vivek Khera [EMAIL PROTECTED] wrote:

 Recommending I run vacuum intermixed with the data purge is a non-
 starter; the vacuum on these tables takes a couple of hours.  I'd
 never finish purging my data with that kind of delay.

...
 I will investigate the fill-factor.  That seems like it may make some
 sense the way I do inserts and updates...


Undoubtedly.

But if most of your indexed keys are gone, then a reindex is useful.
If this is a hugely live system and you don't have a great number of
indexes, then a somewhat kludgish way to try could be to create a copy
of the table, do what you wish with it (delete rows, index them, then
cluster them on that index)...and whenever the process finishes (3
hours, or 3 days...no matter, because it doesn't hurt your live
system), you simply rename the old table to TABLE_OLD and the new
table to TABLE. The renaming operation is instant.

Anyway, what is your maintenance_work_mem? Try increasing your
maintenance_work_mem and see if that helps vacuuming first. Vacuum
operations can be sped up dramatically. We need regular vacuums and
that is critical to our application, so I have a m_w_m of 512K.

Mind you -- even if your DB vacuums for a couple hours, vacuum doesn't
affect the performance of your live system while it is happening, so
frequent vacuuming cannot hurt you one way or another, and it can
surely help.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-19 Thread Phoenix Kiula
On 19/09/2007, Gregory Williamson [EMAIL PROTECTED] wrote:

...

  Can't speak directly to PostgreSQL but in Informix the fill factor is
 useful for tweaking indexes. A very high fill factor is useful for tables
 that are static -- any inserts or changes to the index trigger a *lot* of
 moving of b-tree branches. But the high fill factor means that each page has
 more useful data references in it. A very low fill factor means that pages
 are sparse and so inserts and updates are less likely to trigger massive
 b-tree rebalancings.



Assuming pgsql's fill factor is similar to Informix' (yes, a highly
suspect assumption), could we say:

1. A small fill factor such as 10 or 20 would be good for the index
size and will not trigger massive btree rebalancings? (I'm first
playing with a value of 60 for now and seeing how it works out...seems
ok at the moment!)

2. Is this fill factor enough to have on the table, or should I also
do a fill factor for specific indexes? Or both the table and the
index? (I have four btree indexes on the table)

Thanks

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-19 Thread Phoenix Kiula
On 19/09/2007, Richard Broersma Jr [EMAIL PROTECTED] wrote:
 --- Phoenix Kiula [EMAIL PROTECTED] wrote:

  2. Is this fill factor enough to have on the table, or should I also
  do a fill factor for specific indexes? Or both the table and the
  index? (I have four btree indexes on the table)

 I don't think that fill factor can be applied to the table.  The CREATE TABLE 
 reference doc show
 that fill factor can be used in the CREATE TABLE statement, but it is only 
 applied to syntax that
 creates an implied index.

 i.e.  CREATE TABLE test (
 test_idINTEGER   PRIMARY KEY WITH ( FILLFACTOR = 70 ),
 test_val   TEXT );

 Primary key will create an implied index.  Fill factor is applied to that 
 implied index.

 Regards,
 Richard Broersma Jr.




Then I am confused again about how the fill factor stuff works. Let's
say I have a table with four BTREE indexes. Should all of them have a
fill factor of about 60 (lower than the default 90, that is) to be
effective? Or will it help if I lower the fill factor on only a couple
of the most often used ones? The primary key index is very, very
rarely updated so I don't need it to have a fill factor.

I could try and see these one by one, but that's no better than
touching/feeling the database blind-folded. I would love to see some
writeup about this whole thing, but it seems hard to come by!

Many thanks

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-19 Thread Richard Broersma Jr
--- Phoenix Kiula [EMAIL PROTECTED] wrote:

 Then I am confused again about how the fill factor stuff works. Let's
 say I have a table with four BTREE indexes. Should all of them have a
 fill factor of about 60 (lower than the default 90, that is) to be
 effective? Or will it help if I lower the fill factor on only a couple
 of the most often used ones? The primary key index is very, very
 rarely updated so I don't need it to have a fill factor.
 
 I could try and see these one by one, but that's no better than
 touching/feeling the database blind-folded. I would love to see some
 writeup about this whole thing, but it seems hard to come by!

I will try to explain everything that I understand about indexes and tables.  I 
am sure that if
some part of my understanding is incorrect, someone will chime in to correct me.

In PostgreSQL, tables are physically distinct from indexes.  This means that 
any give table is
written to disk as a file(s), and indexes are also written to disk as a 
separate file(s).  A table
and index are both saved to disk in segmented block referred to a pages (I 
believe the default
size is 8K).  

The advantage of the index file is that it is significantly smaller in size, so 
it takes less time
to sequentially scan and less time to read from disk.

Now when you want to find a record, PostgreSQL will/(may choose to) 
sequentially scan the index
until it find the record entry that corresponds with your criteria.  This entry 
has a table
cross-reference to the actual page that contains the record that is pointed 
at by the index.
Lastly, the entire table page containing your record is read from disk to 
memory for further query
processing.

ASSUMPTION
When you insert a record into a table that generates an entry into the b-tree 
index file,
PostgreSQL will scan the pages of the index file to find the correct place and 
index page to add
this entry.  If the page is already full, PostgreSQL probably replaces the 
old full pages with
two new pages with a distribution of that chunk of the B-tree index, and then 
adds the new entry
to one of those pages. This operation can become very expensive if many new 
pages need to be
created from single INSERT/UPDATE statement.
/ASSUMPTION

By using fill factor, you are telling PostgreSQL to automatically leave a 
portion of any newly
created index page partially blank for future use.  When a newly created index 
entry needs to be
saved, it can be stored in one of the holes left in the index page.  

A large fill factor create both advantages and dis-advantages.  For writing 
operations, it is a
big advantage because, a large fill factor will leave alot of holes for INSERT 
and UPDATE
operations to use.  This can help increase the number of UPDATE/INSERT per 
seconds that you server
can handle since, they index pages do not have to be rebuilt very often.

However, the disadvantage is that, a newly created index with a large fill 
factor has index
bloat designed into it.  This mean that the index pages have a large portion 
of holes.  So
PostgreSQL will create more index pages than it would normally in order to hold 
both your index
and the pre-defined holes that you specified by your fill-factor.  Larger 
indexes require more
time to be read from disk to memory and will require more time to sequentially 
scan to find to
find the cross-reference table page location of records of interest.  So the 
net effect is that
larger indexes will make SELECT statement slower.

This is my understanding for tables indexes and fill factor. I hope it helps.

Regards,
Richard Broersma Jr.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-19 Thread Albe Laurenz
Phoenix Kiula wrote:
 Then I am confused again about how the fill factor stuff works. Let's
 say I have a table with four BTREE indexes. Should all of them have a
 fill factor of about 60 (lower than the default 90, that is) to be
 effective? Or will it help if I lower the fill factor on only a couple
 of the most often used ones? The primary key index is very, very
 rarely updated so I don't need it to have a fill factor.
 
 I could try and see these one by one, but that's no better than
 touching/feeling the database blind-folded. I would love to see some
 writeup about this whole thing, but it seems hard to come by!

Let me present my understanding; I hope I won't confuse issues further.

Indexes and tables are bath organized in pages, each page contains
several
entries or rows.

When an INSERT or UPDATE on the table occurs, a new row (version) is
created (and the old version of the row will be freed upon VACUUM).

Any index entry that points to this row will have to be changed
because the location of the row has changed.
That means that there will also have to be a new entry in the index,
even if the key has not changed (simply modifying the existing
index entry to point to the new row location won't do, because there
may be transactions that still need the old version of the row).

Reducing fillfactor on tables (default 100) will reduce the number
of table pages that need to be touched during an UPDATE.

Reducing fillfactor on a B-tree index (default 90) will reduce the
frequency of page splits that can happen upon INSERT or UPDATE.

Both at the cost of wasting some disk (and memory) space.

So I *guess* that when you decide that a table will be heavily updated
and you want to reduce disk I/O at the cost of wasting some space,
it will be a good idea to reduce fillfactor on the table and all its
indexes.

I emphasize the guess because
a) I may have made a mistake in my deductions :^) and
b) I cannot tell you good numbers to choose.

As in most performance tuning questions, the best thing you can probably
is to test and compare various settings and see which performs best
for you

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-19 Thread Erik Jones

On Sep 19, 2007, at 9:29 AM, Richard Broersma Jr wrote:


--- Phoenix Kiula [EMAIL PROTECTED] wrote:


Then I am confused again about how the fill factor stuff works. Let's
say I have a table with four BTREE indexes. Should all of them have a
fill factor of about 60 (lower than the default 90, that is) to be
effective? Or will it help if I lower the fill factor on only a  
couple

of the most often used ones? The primary key index is very, very
rarely updated so I don't need it to have a fill factor.

I could try and see these one by one, but that's no better than
touching/feeling the database blind-folded. I would love to see some
writeup about this whole thing, but it seems hard to come by!


I will try to explain everything that I understand about indexes  
and tables.  I am sure that if
some part of my understanding is incorrect, someone will chime in  
to correct me.


In PostgreSQL, tables are physically distinct from indexes.  This  
means that any give table is
written to disk as a file(s), and indexes are also written to disk  
as a separate file(s).  A table
and index are both saved to disk in segmented block referred to a  
pages (I believe the default

size is 8K).


Yes, that can be changed at compile time, although I  don't think  
I've ever heard of any advantages to doing that.




The advantage of the index file is that it is significantly smaller  
in size, so it takes less time

to sequentially scan and less time to read from disk.

Now when you want to find a record, PostgreSQL will/(may choose to)  
sequentially scan the index
until it find the record entry that corresponds with your  
criteria.  This entry has a table
cross-reference to the actual page that contains the record that is  
pointed at by the index.
Lastly, the entire table page containing your record is read from  
disk to memory for further query

processing.

ASSUMPTION
When you insert a record into a table that generates an entry into  
the b-tree index file,
PostgreSQL will scan the pages of the index file to find the  
correct place and index page to add
this entry.  If the page is already full, PostgreSQL probably  
replaces the old full pages with
two new pages with a distribution of that chunk of the B-tree  
index, and then adds the new entry
to one of those pages. This operation can become very expensive if  
many new pages need to be

created from single INSERT/UPDATE statement.
/ASSUMPTION


Yes, the point of B-trees is that they have fast lookup times, but  
updates can be expensive when you have to re-balance your leaf nodes.


By using fill factor, you are telling PostgreSQL to automatically  
leave a portion of any newly
created index page partially blank for future use.  When a newly  
created index entry needs to be

saved, it can be stored in one of the holes left in the index page.


That future use is only for updates.



A large fill factor create both advantages and dis-advantages.  For  
writing operations, it is a
big advantage because, a large fill factor will leave alot of holes  
for INSERT and UPDATE
operations to use.  This can help increase the number of UPDATE/ 
INSERT per seconds that you server
can handle since, they index pages do not have to be rebuilt very  
often.


Not quite.  Once a page has reached it's fill factor percentage full,  
no more inserts will happen on that page, only updates.  Also, I  
think you have large/small backwards wrt fill factor.  If  you have a  
fill factor of, say, 40% then once a page has reached 40% full no  
more inserts will happen (unless some space is reclaimed by vacuum).   
So, smaller fill factors == bigger holes.  The bigger the fill  
factor, the smaller the whole:  if you have a fill factor of 90%,  
only 10% is reserved for updates of rows on that page.




However, the disadvantage is that, a newly created index with a  
large fill factor has index
bloat designed into it.  This mean that the index pages have a  
large portion of holes.  So
PostgreSQL will create more index pages than it would normally in  
order to hold both your index
and the pre-defined holes that you specified by your fill-factor.   
Larger indexes require more
time to be read from disk to memory and will require more time to  
sequentially scan to find to
find the cross-reference table page location of records of  
interest.  So the net effect is that

larger indexes will make SELECT statement slower.

This is my understanding for tables indexes and fill factor. I hope  
it helps.


Again, with the large v. small fill factor point.  Using fill factor  
seems to be a trade-off between space and update efficiency.  Let's  
say that after so many (potential) updates you know that each row  
will become static, i.e. no more updates will happen.  Let's use some  
numbers and make them easy to work with.  Say each row will be  
updated exactly once and you use a fill factor of 50%.  Now, say 5K  
of fresh index data is written.  The first 4K will go into one page  
at which 

Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-19 Thread Richard Broersma Jr
 Not quite.  Once a page has reached it's fill factor percentage full,  
 no more inserts will happen on that page, only updates.  Also, I  
 think you have large/small backwards wrt fill factor.  If  you have a  
 fill factor of, say, 40% then once a page has reached 40% full no  
 more inserts will happen (unless some space is reclaimed by vacuum).   
 So, smaller fill factors == bigger holes.  The bigger the fill  
 factor, the smaller the whole:  if you have a fill factor of 90%,  
 only 10% is reserved for updates of rows on that page.

So (just to reiterate), fill factor can be applied to both a table and/or an 
index(es).  But the
holes built into the page of a table or index can only be filled by UPDATE 
Statements.

Thanks for the clarification!

Regards,

Richard Broesma Jr.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-19 Thread Richard Broersma Jr
--- Phoenix Kiula [EMAIL PROTECTED] wrote:

 2. Is this fill factor enough to have on the table, or should I also
 do a fill factor for specific indexes? Or both the table and the
 index? (I have four btree indexes on the table)

I don't think that fill factor can be applied to the table.  The CREATE TABLE 
reference doc show
that fill factor can be used in the CREATE TABLE statement, but it is only 
applied to syntax that
creates an implied index.

i.e.  CREATE TABLE test (
test_idINTEGER   PRIMARY KEY WITH ( FILLFACTOR = 70 ),
test_val   TEXT );

Primary key will create an implied index.  Fill factor is applied to that 
implied index.

Regards,
Richard Broersma Jr.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-19 Thread Erik Jones


On Sep 19, 2007, at 10:30 AM, Richard Broersma Jr wrote:


Not quite.  Once a page has reached it's fill factor percentage full,
no more inserts will happen on that page, only updates.  Also, I
think you have large/small backwards wrt fill factor.  If  you have a
fill factor of, say, 40% then once a page has reached 40% full no
more inserts will happen (unless some space is reclaimed by vacuum).
So, smaller fill factors == bigger holes.  The bigger the fill
factor, the smaller the whole:  if you have a fill factor of 90%,
only 10% is reserved for updates of rows on that page.


So (just to reiterate), fill factor can be applied to both a table  
and/or an index(es).  But the
holes built into the page of a table or index can only be filled  
by UPDATE Statements.


Thanks for the clarification!


Yep.  Although, to be strictly honest, I guess the term UPDATE isn't  
the best term to use for indexes.  My description works best for  
tables, see the section on FILLFACTOR in http://www.postgresql.org/ 
docs/8.2/interactive/sql-createindex.html for a better description of  
what happens for indexes -- slightly different semantics, but the  
same general effect.


Also, note that once we have HOT, figuring out fill factor for  
indexes will be a whole different ball game.  Currently, an update to  
any tuple in a table, results in a new index entry.  With hot, index  
entries will only happen if the indexed column is changed in the update.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-19 Thread Richard Broersma Jr
--- Erik Jones [EMAIL PROTECTED] wrote:

 Also, note that once we have HOT...

I am not sure what the acronym HOT stands for.  Does it have something to do 
with MVCC?

Regards,
Richard Broersma Jr.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-19 Thread Erik Jones

On Sep 19, 2007, at 11:00 AM, Richard Broersma Jr wrote:


--- Erik Jones [EMAIL PROTECTED] wrote:


Also, note that once we have HOT...


I am not sure what the acronym HOT stands for.  Does it have  
something to do with MVCC?




Heap Only Tuple.  Here's a link to the (latest?) readme for it:  
http://archives.postgresql.org/pgsql-patches/2007-09/msg00261.php


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-18 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ow Mun Heng wrote:
 On Mon, 2007-09-17 at 22:14 -0700, Joshua D. Drake wrote:
 Phoenix Kiula wrote:
 So a YES/NO question:

 Q: To get rid of index bloat, is a VACUUM ANALYZE enough? Or must I
 reindex/cluster indexes?
 If you overrun your max_fsm_pages, no:
   else yes;
 
 Maybe my english suck, but I don't understand the above answer.
 
 If I overrun my Max_FSM_pages then a vacuum analyse is enough to return
 it back to normal.

If you overrun your max_fsm_pages then vacuum analyze IS NOT ENOUGH. If
you do not overrun your max_fsm_pages, yes vacuum analyze can deal with
the issue.

Joshua D. Drake

 
 If I _didn't_ overrun my fsm, then a reindex/cluster is necessary.
 
 Did I get that right? (I feel it's wrong and a reindex/cluster is needed
 only when I overrun my max_fsm)
 
 
 


- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG72kwATb/zqfZUUQRAqNMAJsFjqWirgGF+VlEIwaVDnxBAefeSwCfesD1
osqiudjcEY/tyibvNZRJ/UU=
=apjz
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-18 Thread Filip Rembiałkowski
2007/9/18, Joshua D. Drake [EMAIL PROTECTED]:

 If you overrun your max_fsm_pages then vacuum analyze IS NOT ENOUGH. If
 you do not overrun your max_fsm_pages, yes vacuum analyze can deal with
 the issue.

Are you sure? I have a situation where above is no true. postgres
version 8.1.8. while vacuum verbose says:

INFO:  free space map contains 2329221 pages in 490 relations
DETAIL:  A total of 2345744 page slots are in use (including overhead).
2345744 page slots are required to track all free space.
Current limits are:  1000 page slots, 1000 relations, using 58698 KB.

... and we have constant problem with index bloat and need to REINDEX
frequently.

the database is very redundant and has quite hight data retention rate
(it's an ecommerce site)


-- 
Filip Rembiałkowski

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-18 Thread Ow Mun Heng
On Tue, 2007-09-18 at 06:01 -0400, Bill Moran wrote:
 * (with newer version) reduce the fill factor and REINDEX

What is fill factor?

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-18 Thread Bill Moran
Filip Rembiałkowski [EMAIL PROTECTED] wrote:

 2007/9/18, Joshua D. Drake [EMAIL PROTECTED]:
 
  If you overrun your max_fsm_pages then vacuum analyze IS NOT ENOUGH. If
  you do not overrun your max_fsm_pages, yes vacuum analyze can deal with
  the issue.
 
 Are you sure? I have a situation where above is no true. postgres
 version 8.1.8. while vacuum verbose says:
 
 INFO:  free space map contains 2329221 pages in 490 relations
 DETAIL:  A total of 2345744 page slots are in use (including overhead).
 2345744 page slots are required to track all free space.
 Current limits are:  1000 page slots, 1000 relations, using 58698 KB.
 
 ... and we have constant problem with index bloat and need to REINDEX
 frequently.
 
 the database is very redundant and has quite hight data retention rate
 (it's an ecommerce site)

I've been involved in a number of the discussions on this, and I think
part of the confusion stems from the fact that index bloat is an
ambiguous term.

If the index gets large enough that it no longer fits in shared memory,
and reindexing it will reduce its size to where it _will_ fit in shared
memory, then the index _could_ be said to be bloated.

However, an equally valid solution to that problem is to increase the
amount of shared memory available (possibly by adding RAM).

Unfortunately, folks like Phoenix are looking for yes/no answers, and
with many of these questions, the _correct_ answer is it depends on
your workload

If you find that reindexing improves performance, then you should
investigate further.  Depending on the exact nature of the problem,
there are many possible solutions, three that come to mind:
* Add RAM/SHM
* REINDEX on a regular schedule
* (with newer version) reduce the fill factor and REINDEX

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-18 Thread Rodrigo De León
On 9/18/07, Ow Mun Heng [EMAIL PROTECTED] wrote:
 On Tue, 2007-09-18 at 06:01 -0400, Bill Moran wrote:
  * (with newer version) reduce the fill factor and REINDEX

 What is fill factor?

See Index Storage Parameters:
http://www.postgresql.org/docs/8.2/static/sql-createindex.html

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-18 Thread Sander Steffann
Hi,

 Can I add SHM with merely by managing the entry in sysctl.conf? My
 current values:
 
 kernel.shmmax = 536870912
 kernel.shmall = 536870912
 
 My shared_buffers in postgresql.conf is 2. From the website
 http://www.desknow.com/kb/idx/12/061/article/ I notice that shmmax
 should be sharedbuffer*8192, so I suppose my shmmax can be much lower
 than the above, but I raised it for performance. Am I wrong to do so?

You need to configure the kernel so it allows processes to use more shared
memory. This does not mean that a process automatically uses it. For
PostgreSQL you will need to increase shared_buffers to make it use the extra
available shared memory. With your shared memory settings you can probably
increase shared_buffers to about 65000.

With the 'ipcs' command you can see how much shared memory PostgreSQL uses.
Look under 'Shared Memory Segments' to memory owned by user postgres.

- Sander



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-18 Thread Phoenix Kiula
 Unfortunately, folks like Phoenix are looking for yes/no answers, and
 with many of these questions, the _correct_ answer is it depends on
 your workload


I wanted merely to simplify the advice that gets dispensed on this
list, often conflicting to novice ears like mine. So I appreciate your
notes very much.


 If you find that reindexing improves performance, then you should
 investigate further.  Depending on the exact nature of the problem,
 there are many possible solutions, three that come to mind:
 * Add RAM/SHM


Can I add SHM with merely by managing the entry in sysctl.conf? My
current values:

kernel.shmmax = 536870912
kernel.shmall = 536870912

My shared_buffers in postgresql.conf is 2. From the website
http://www.desknow.com/kb/idx/12/061/article/ I notice that shmmax
should be sharedbuffer*8192, so I suppose my shmmax can be much lower
than the above, but I raised it for performance. Am I wrong to do so?


 * REINDEX on a regular schedule


This is sadly not really feasible, because we need to offer a 100%
availability website. REINDEX does not work concurrently so it is not
really an option for us. My max_fsm_pages and max_fsm_relations are
way above the numbers that come up after the VACUUM ANALYZE VERBOSE
run.

But still, the autovacuum stuff seems like it is not working at all.
Some related entries in the conf file:

autovacuum   = on
autovacuum_vacuum_cost_delay = 20
vacuum_cost_delay= 20
autovacuum_naptime   = 30
stats_start_collector= on
stats_row_level  = on
autovacuum_vacuum_threshold  = 80
autovacuum_analyze_threshold = 80

And yet, the db often slows down, at which point I manually login and
run a manual VACUUM ANALYZE and it seems fine for some more time.
Sometimes, I also restart pgsql and that seems to help for a while.

Another advice on these forums is to see vmstat 1, without actually
specifying how to draw inferences from it. The free version of it is
coming up at decent rates, as follows:


procs ---memory-- ---swap-- -io --system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us sy id wa
 1  0  29124 110760 108980 346773601   206   1400 4  2  1 85 12
 0  0  29124 110632 108980 346773600 0 0 1052   108  0  0 100  0
 2  0  29124 108840 108980 346773600 0 0 1112   299  1  1 98  0
 1  0  29124 109288 108980 346773600 0 0 1073   319  2  1 98  0
.




 * (with newer version) reduce the fill factor and REINDEX



I think some of my tables are updated very frequently so a smaller
fill factor will be nice. How can I find the current fill factor on my
tables? Also, is there some method or science to calculating a decent
fill factor -- size of table, number of indexes, frequency of updates,
and such? We have one major table which faces a lot of INSERTs and
UPDATES in a day (up to 10,000) but many many more SELECTs (up to 10
million).

Thanks.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-18 Thread Sander Steffann
Hi,

 Now, I can merrily increase the shared_buffers, but the manual warns
 me against increasing the value too much because it is per
 transaction value.

Shared_buffers is not per-transaction. Where did you find this information?

- Sander



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-18 Thread Phoenix Kiula
On 18/09/2007, Sander Steffann [EMAIL PROTECTED] wrote:
 Hi,

  Can I add SHM with merely by managing the entry in sysctl.conf? My
  current values:
 
  kernel.shmmax = 536870912
  kernel.shmall = 536870912
 
  My shared_buffers in postgresql.conf is 2. From the website
  http://www.desknow.com/kb/idx/12/061/article/ I notice that shmmax
  should be sharedbuffer*8192, so I suppose my shmmax can be much lower
  than the above, but I raised it for performance. Am I wrong to do so?

 You need to configure the kernel so it allows processes to use more shared
 memory. This does not mean that a process automatically uses it. For
 PostgreSQL you will need to increase shared_buffers to make it use the extra
 available shared memory. With your shared memory settings you can probably
 increase shared_buffers to about 65000.



Thanks, the IPCS command shows me this:


-- Shared Memory Segments 
keyshmid  owner  perms  bytes  nattch status
0x0052e2c1 6782976postgres  600176668672  2


Now, I can merrily increase the shared_buffers, but the manual warns
me against increasing the value too much because it is per
transaction value.

So here's the conflict for a novice like me:

1. Do not increase shared_buffer too much because it is per-transaction.
2. Do increase the SHM for performance, but it is only useful if you
also increase shared_buffer.

So which is it?

Would it help to increase the effective_cache_size? It is currently at
512000.

I have 4GB ram on the machine, but am willing to devote about 2GB to pgsql.

Thanks!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-18 Thread Bill Moran
In response to Phoenix Kiula [EMAIL PROTECTED]:

  If you find that reindexing improves performance, then you should
  investigate further.  Depending on the exact nature of the problem,
  there are many possible solutions, three that come to mind:
  * Add RAM/SHM
 
 Can I add SHM with merely by managing the entry in sysctl.conf? My
 current values:
 
 kernel.shmmax = 536870912
 kernel.shmall = 536870912

These values define the max allowed.  They exist to keep poorly written
applications from sucking up all the available memory.  Setting them
higher than is needed does not cause any problems, unless a greedy or
poorly-written application grabs all that memory.

 My shared_buffers in postgresql.conf is 2. From the website
 http://www.desknow.com/kb/idx/12/061/article/ I notice that shmmax
 should be sharedbuffer*8192, so I suppose my shmmax can be much lower
 than the above, but I raised it for performance. Am I wrong to do so?

It's completely impossible to tell without knowing more about your
physical hardware.  The rule of thumb is 1/3 physical RAM to start, then
adjust if more or less seems to help.  That advice is for versions of
PG = 8.  If you're still running a 7.X version, upgrade.

How much RAM does this system have in it?  Unless you have other
applications running on this system using RAM, you should allocate
more of it to shared_buffers.  If 160M is 1/3 your RAM, you probably
need to add more RAM.

How big is your database?  If it's possible to fit it all in
shared_buffers, that will give you the best performance.  

  * REINDEX on a regular schedule
 
 This is sadly not really feasible, because we need to offer a 100%
 availability website. REINDEX does not work concurrently so it is not
 really an option for us. My max_fsm_pages and max_fsm_relations are
 way above the numbers that come up after the VACUUM ANALYZE VERBOSE
 run.

Hence my comment about depending on your workload and investigating
the situation to determine the best solution.  

 But still, the autovacuum stuff seems like it is not working at all.
 Some related entries in the conf file:
 
 autovacuum   = on
 autovacuum_vacuum_cost_delay = 20
 vacuum_cost_delay= 20
 autovacuum_naptime   = 30
 stats_start_collector= on
 stats_row_level  = on
 autovacuum_vacuum_threshold  = 80
 autovacuum_analyze_threshold = 80
 
 And yet, the db often slows down, at which point I manually login and
 run a manual VACUUM ANALYZE and it seems fine for some more time.
 Sometimes, I also restart pgsql and that seems to help for a while.

You don't mention *_scale_factor settings.  Those are going to be
important as well.  Based on your symptoms, it sounds like autovacuum
is not getting those tables vacuumed enough.  I recommend raising the
debug level and watching the logs to see if autovacuum is actually
getting tables vacuumed.  Consider lowering your *_scale_factor values
if not.  Or even reducing autovacuum_naptime.

 Another advice on these forums is to see vmstat 1, without actually
 specifying how to draw inferences from it. The free version of it is
 coming up at decent rates, as follows:
 
 
 procs ---memory-- ---swap-- -io --system-- cpu
  r  b   swpd   free   buff  cache   si   sobibo   incs us sy id wa
  1  0  29124 110760 108980 346773601   206   1400 4  2  1 85 
 12
  0  0  29124 110632 108980 346773600 0 0 1052   108  0  0 100 
  0
  2  0  29124 108840 108980 346773600 0 0 1112   299  1  1 98   0
  1  0  29124 109288 108980 346773600 0 0 1073   319  2  1 98   0
 .

Explaining how to interpret the output of this command and determine
what to do with it is not something easily done in a short paragraph.
However, it looks like you've got a lot of RAM being used for the disk
cache.  That memory would probably be better used as shared_buffers, so
I suggest you increase that value considerably.

  * (with newer version) reduce the fill factor and REINDEX
 
 I think some of my tables are updated very frequently so a smaller
 fill factor will be nice. How can I find the current fill factor on my
 tables? Also, is there some method or science to calculating a decent
 fill factor -- size of table, number of indexes, frequency of updates,
 and such? We have one major table which faces a lot of INSERTs and
 UPDATES in a day (up to 10,000) but many many more SELECTs (up to 10
 million).

I'm not sure how to find the current value, but a smaller fill factor
on busy tables should lead to less fragmentation, thus more efficient
indexes over time.  Keep in mind that a smaller fill factor will also
lead to larger indexes initially.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


index fillfactor (was Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER)

2007-09-18 Thread Bill Moran
In response to Phoenix Kiula [EMAIL PROTECTED]:

 Thanks for a very informative post! One question:
 
  I'm not sure how to find the current value, but a smaller fill factor
  on busy tables should lead to less fragmentation, thus more efficient
  indexes over time.  Keep in mind that a smaller fill factor will also
  lead to larger indexes initially.
 
 What constitutes a small fill factor? Would 70 be good?

Unfortunately, I can't say.  I have not yet had the opportunity to
experiment with different fillfactors, so I can only speak in vague
estimations on this topic.

 I guess my
 current must have been the default, which the manual says is 100.

I expect it's at the default, but the docs say that is 90%:
http://www.postgresql.org/docs/8.2/static/sql-createindex.html

Where did you see 100?

 Or
 did you mean really small fill factor like 20? In this context, what
 is packing in the manual -- is that some kind of compression?

Hopefully, someone more knowledgeable will chime in with some wise
suggestions.  Barring that, I can only suggest you experiment to find
what works for your workload, but don't rule out the possibility that
extremely low fillfactor values might work well for you.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-18 Thread Richard Broersma Jr
--- Phoenix Kiula [EMAIL PROTECTED] wrote:

 What constitutes a small fill factor? Would 70 be good? I guess my
 current must have been the default, which the manual says is 100.

On the following link:

http://www.postgresql.org/docs/8.2/interactive/sql-createindex.html#SQL-CREATEINDEX-STORAGE-PARAMETERS

I found this:

B-trees use a default fillfactor of 90, but any value from 10 to 100 can be 
selected.

Regards,
Richard Broersma Jr.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-18 Thread Phoenix Kiula
Thanks for a very informative post! One question:


 I'm not sure how to find the current value, but a smaller fill factor
 on busy tables should lead to less fragmentation, thus more efficient
 indexes over time.  Keep in mind that a smaller fill factor will also
 lead to larger indexes initially.


What constitutes a small fill factor? Would 70 be good? I guess my
current must have been the default, which the manual says is 100. Or
did you mean really small fill factor like 20? In this context, what
is packing in the manual -- is that some kind of compression?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-18 Thread Richard Broersma Jr

--- Gregory Williamson [EMAIL PROTECTED] wrote:

 A very low fill factor means that pages are
 sparse and so inserts and updates are less likely to trigger massive b-tree 
 rebalancings. 

I take it that massive b-tree rebalancings could cause a problem with the 
performance of disk
writing though-put from UPDATEs and INSERTs?

Regards,
Richard Broersma Jr.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-18 Thread Greg Williamson

Richard Broersma Jr wrote:

--- Gregory Williamson [EMAIL PROTECTED] wrote:

  

A very low fill factor means that pages are
sparse and so inserts and updates are less likely to trigger massive b-tree rebalancings. 



I take it that massive b-tree rebalancings could cause a problem with the 
performance of disk
writing though-put from UPDATEs and INSERTs?

Regards,
Richard Broersma Jr.
  
Precisely -- even if it can keep everything in RAM it can occupy quite a 
few cycles to rebalance a large b-tree. And eventually those changes do 
need to get written to disk so the next checkpoint (I think) will also 
have more work.

G

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-18 Thread Gregory Williamson
Sorry for top-posting -- challenged reader.

Can't speak directly to PostgreSQL but in Informix the fill factor is useful 
for tweaking indexes. A very high fill factor is useful for tables that are 
static -- any inserts or changes to the index trigger a *lot* of moving of 
b-tree branches. But the high fill factor means that each page has more useful 
data references in it. A very low fill factor means that pages are sparse and 
so inserts and updates are less likely to trigger massive b-tree rebalancings. 

I've never used it on PostgreSQL (yet!) but am looking forward to it.

Beware of premature optimization!

HTH,

Greg Williamson
Senior DBA
GlobeXplorer LLC, a DigitalGlobe company

Confidentiality Notice: This e-mail message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information and must be protected in accordance with those 
provisions. Any unauthorized review, use, disclosure or distribution is 
prohibited. If you are not the intended recipient, please contact the sender by 
reply e-mail and destroy all copies of the original message.

(My corporate masters made me say this.)



-Original Message-
From: [EMAIL PROTECTED] on behalf of Richard Broersma Jr
Sent: Tue 9/18/2007 10:29 AM
To: Phoenix Kiula; Bill Moran
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
 
--- Phoenix Kiula [EMAIL PROTECTED] wrote:

 What constitutes a small fill factor? Would 70 be good? I guess my
 current must have been the default, which the manual says is 100.

On the following link:

http://www.postgresql.org/docs/8.2/interactive/sql-createindex.html#SQL-CREATEINDEX-STORAGE-PARAMETERS

I found this:

B-trees use a default fillfactor of 90, but any value from 10 to 100 can be 
selected.

Regards,
Richard Broersma Jr.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match



Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-17 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Phoenix Kiula wrote:
 The manual is vague. Several threads about this, in language that is
 ambiguous to me.
 
 So a YES/NO question:
 
 Q: To get rid of index bloat, is a VACUUM ANALYZE enough? Or must I
 reindex/cluster indexes?

If you overrun your max_fsm_pages, no:
  else yes;

Sincerely,

Joshua D. Drake

 
 Thanks.
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq
 


- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG717QATb/zqfZUUQRAh6uAJ9CGXbA2BxXvMbSZP9Gv8gI9QBkXgCePhqe
6aS3fp60g7YrWECspTVcxyE=
=u2o/
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

2007-09-17 Thread Ow Mun Heng
On Mon, 2007-09-17 at 22:14 -0700, Joshua D. Drake wrote:
 Phoenix Kiula wrote:
  So a YES/NO question:
  
  Q: To get rid of index bloat, is a VACUUM ANALYZE enough? Or must I
  reindex/cluster indexes?
 
 If you overrun your max_fsm_pages, no:
   else yes;

Maybe my english suck, but I don't understand the above answer.

If I overrun my Max_FSM_pages then a vacuum analyse is enough to return
it back to normal.

If I _didn't_ overrun my fsm, then a reindex/cluster is necessary.

Did I get that right? (I feel it's wrong and a reindex/cluster is needed
only when I overrun my max_fsm)




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings