Re: [GENERAL] How often do I need to reindex tables?

2007-04-19 Thread Bill Moran
In response to Tom Lane [EMAIL PROTECTED]:

 Bill Moran [EMAIL PROTECTED] writes:
  Just an FYI ... I remembered what prompted the cron job.
 
  We were seeing significant performance degradation.  I never did actual
  measurements, but it was on the order of Bill, why is restoring taking
  such a long time? from other systems people.  At the time, I poked around
  and tried some stuff here and there and found that reindex restored
  performance.  I didn't look at actual size at that time.
 
 A reindex might improve performance for reasons other than bloat --- to
 wit, that a freshly-built index is in perfect physical order, which
 tends to get degraded over time by page splits.  How important that is
 depends on your usage patterns.  If this is what the story is for your
 situation, then what might fix it (in 8.2) is to create the index with
 FILLFACTOR 50 or so, so that it's already at the steady state density
 and won't need many page splits.
 
  Anyway, I'll report back in a few weeks as to what the numbers look like.
 
 Yeah, please for the moment just watch what happens with the default
 behavior.

Remember this discussion?

To recap, I had scheduled a weekly reindex of this database because I
was seeing performance issues otherwise.  In order to see if this was
actually helping, I disabled the redindex job, ran a few timing
experiments, then scheduled a job to email me the size of the indexes
in the database on a daily basis.

At this point, I have daily records of index size since March 6th.

The behaviour is like this:  A freshly created index is about 21,000
pages in size.  Under normal usage, the index size balloons to about
38,000 pages immediately after the first backup job is run.  From there
it grows slowly (but fairly consistently) by about 100 pages each day.
As of today, it is 44304 pages.

When I first brought up this discussion, the table contained 8068956
rows.  It now has 7451381, which means it's dropped by 7%

The important part is that I can't reproduce the performance problems
that I originally thought were the result of this.  It's entirely
possible that something else was changed since then that actually
fixed the problem, and that the index bloat was a red herring.

Not sure what (if any) conclusions can be drawn from this.  Is there
any other data I should gather?  Have I just proved my previous
rantings about the necessity of reindexing to be wrong?

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] How often do I need to reindex tables?

2007-04-19 Thread Martin Gainty

Bill and Tom

Best to find out what kind of index you want to create beforehand
If your data is evenly distributed and exhibits High Cardinality (2 entries 
for A,B,C...Z) then I would recommend a BTREE Index

If not (low cardinality scenarios such as gender) then create Bitmap Index
I cant speak for postgres but index creation will necessitate you to 
schedule time when you can bring DB offline (such as a weekend) as most DB 
will not allow a unique index to be created on a table while the table is in 
use

Also I find oracle books and online documentation very helpful specifically
http://otn.oracle.com
Books are available from Oracle Press

HTH
Martin
This email message and any files transmitted with it contain confidential
information intended only for the person(s) to whom this email message is
addressed.  If you have received this email message in error, please notify
the sender immediately by telephone or email and destroy the original
message without making a copy.  Thank you.

- Original Message - 
From: Bill Moran [EMAIL PROTECTED]

To: Tom Lane [EMAIL PROTECTED]
Cc: pgsql-general@postgresql.org
Sent: Thursday, April 19, 2007 9:33 AM
Subject: Re: [GENERAL] How often do I need to reindex tables?



In response to Tom Lane [EMAIL PROTECTED]:


Bill Moran [EMAIL PROTECTED] writes:
 Just an FYI ... I remembered what prompted the cron job.

 We were seeing significant performance degradation.  I never did actual
 measurements, but it was on the order of Bill, why is restoring taking
 such a long time? from other systems people.  At the time, I poked 
 around

 and tried some stuff here and there and found that reindex restored
 performance.  I didn't look at actual size at that time.

A reindex might improve performance for reasons other than bloat --- to
wit, that a freshly-built index is in perfect physical order, which
tends to get degraded over time by page splits.  How important that is
depends on your usage patterns.  If this is what the story is for your
situation, then what might fix it (in 8.2) is to create the index with
FILLFACTOR 50 or so, so that it's already at the steady state density
and won't need many page splits.

 Anyway, I'll report back in a few weeks as to what the numbers look 
 like.


Yeah, please for the moment just watch what happens with the default
behavior.


Remember this discussion?

To recap, I had scheduled a weekly reindex of this database because I
was seeing performance issues otherwise.  In order to see if this was
actually helping, I disabled the redindex job, ran a few timing
experiments, then scheduled a job to email me the size of the indexes
in the database on a daily basis.

At this point, I have daily records of index size since March 6th.

The behaviour is like this:  A freshly created index is about 21,000
pages in size.  Under normal usage, the index size balloons to about
38,000 pages immediately after the first backup job is run.  From there
it grows slowly (but fairly consistently) by about 100 pages each day.
As of today, it is 44304 pages.

When I first brought up this discussion, the table contained 8068956
rows.  It now has 7451381, which means it's dropped by 7%

The important part is that I can't reproduce the performance problems
that I originally thought were the result of this.  It's entirely
possible that something else was changed since then that actually
fixed the problem, and that the index bloat was a red herring.

Not sure what (if any) conclusions can be drawn from this.  Is there
any other data I should gather?  Have I just proved my previous
rantings about the necessity of reindexing to be wrong?

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

---(end of broadcast)---
TIP 6: explain analyze is your friend




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] How often do I need to reindex tables?

2007-03-02 Thread Bill Moran
In response to Tom Lane [EMAIL PROTECTED]:

 Bill Moran [EMAIL PROTECTED] writes:
  Just an FYI ... I remembered what prompted the cron job.
 
  We were seeing significant performance degradation.  I never did actual
  measurements, but it was on the order of Bill, why is restoring taking
  such a long time? from other systems people.  At the time, I poked around
  and tried some stuff here and there and found that reindex restored
  performance.  I didn't look at actual size at that time.
 
 A reindex might improve performance for reasons other than bloat --- to
 wit, that a freshly-built index is in perfect physical order, which
 tends to get degraded over time by page splits.  How important that is
 depends on your usage patterns.

This goes back to the heart of the original question, which was: how often
do I need to reindex.  The answer was rarely, if ever and if you're
really worried about this, you can monitor _size_ via these queries ...

I guess I focused too much on size in my response.  As you point out,
bloat isn't the only indicator that an index would benefit from being
rebuilt.

 If this is what the story is for your
 situation, then what might fix it (in 8.2) is to create the index with
 FILLFACTOR 50 or so, so that it's already at the steady state density
 and won't need many page splits.

Interesting.  So a major factor in performance degradation is when the
index has to split pages.  I read about FILLFACTOR in the docs, but it
didn't click as to what use it was until your statement.

We're still evaluating 8.2.  We've hit a few issues with our application
and plpgsql, but I think those are minor.  As far as deploying it for
our Bacula systems -- I just need to find the time to be sure that it
doesn't introduce any problems, but I suspect there's a low chance of
that with Bacula.

  Anyway, I'll report back in a few weeks as to what the numbers look like.
 
 Yeah, please for the moment just watch what happens with the default
 behavior.

Yup.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] How often do I need to reindex tables?

2007-03-02 Thread Jeff Davis
On Wed, 2007-02-28 at 09:17 -0800, Joshua D. Drake wrote:
 Bill, you are right but I believe Jim was speaking from a general
 perspective. Generally speaking you should not have to reindex, or if
 you do very rarely.
 
 I too have a couple of databases we manage that require a reindex more
 often than what would be considered normal, but a reindex is far from
 the norm itself.
 

Isn't a REINDEX still needed in the case of monotonically increasing
keys, such as in a sequence or timestamp index? I also delete tuples, so
that results in a forward-shifting range of keys.

If this is not normal, I need to re-evaluate my autovacuum settings.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] How often do I need to reindex tables?

2007-03-02 Thread Tom Lane
Jeff Davis [EMAIL PROTECTED] writes:
 Isn't a REINDEX still needed in the case of monotonically increasing
 keys, such as in a sequence or timestamp index? I also delete tuples, so
 that results in a forward-shifting range of keys.

No, that shouldn't be a problem, if you're maintaining a constant key
range width (that is, *all* the old entries get deleted).  The only
pattern I'm aware of that causes a problem is if you leave a small
subset of the keys behind, for instance insert every few minutes and
then later delete all but one entry per day.  In this situation you may
end up with an index containing as few as one entry per page.  We
don't have any mechanism short of REINDEX to collapse nonempty index
pages together, so that way lies bloat.  But if you delete all the old
entries then the pages get recycled and there shouldn't be a problem.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] How often do I need to reindex tables?

2007-03-02 Thread Jeff Davis
On Fri, 2007-03-02 at 16:39 -0500, Tom Lane wrote:
 Jeff Davis [EMAIL PROTECTED] writes:
  Isn't a REINDEX still needed in the case of monotonically increasing
  keys, such as in a sequence or timestamp index? I also delete tuples, so
  that results in a forward-shifting range of keys.
 
 No, that shouldn't be a problem, if you're maintaining a constant key
 range width (that is, *all* the old entries get deleted).  The only
 pattern I'm aware of that causes a problem is if you leave a small
 subset of the keys behind, for instance insert every few minutes and
 then later delete all but one entry per day.  In this situation you may
 end up with an index containing as few as one entry per page.  We
 don't have any mechanism short of REINDEX to collapse nonempty index
 pages together, so that way lies bloat.  But if you delete all the old
 entries then the pages get recycled and there shouldn't be a problem.
 

You just described this particular table, so I will need to continue
REINDEXing. It's getting maybe 10-50 inserts per second, and most expire
in an day. However, a small percentage hang around for much longer.

REINDEX isn't a problem for me, because there are periods of low usage.

I think if I really wanted to eliminate REINDEX I could move the few
remaining records into another table and have a view accross them. 

Regards,
Jeff Davis




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] How often do I need to reindex tables?

2007-03-01 Thread Vivek Khera


On Feb 28, 2007, at 5:35 PM, Bill Moran wrote:


Just an FYI ... I remembered what prompted the cron job.

We were seeing significant performance degradation.  I never did  
actual
measurements, but it was on the order of Bill, why is restoring  
taking
such a long time? from other systems people.  At the time, I poked  
around

and tried some stuff here and there and found that reindex restored
performance.  I didn't look at actual size at that time.


I have two huge tables (one tracks messages sent, one tracks URL  
click-throughs from said messages) from which I purge old data every  
few weeks.  The primary key indexes on these get bloated after a few  
months and performance goes way down like you observe.  A reindex  
fixes up the performance issues pretty well on those tables, and  
often shaves off a few gigs of disk space too.


We have to manually run the reindex because it has to be timed such  
that the service is not impacted (ie, run on major holiday weekends)  
and we have to take down part of the service and point other parts to  
backup servers, etc.  Not an easy chore...


This is on Pg 8.1.  Don't even ask me how it was in the 7.4 days when  
we have maybe 10% of the data! :-)





smime.p7s
Description: S/MIME cryptographic signature


Re: [GENERAL] How often do I need to reindex tables?

2007-03-01 Thread Tom Lane
Bill Moran [EMAIL PROTECTED] writes:
 Just an FYI ... I remembered what prompted the cron job.

 We were seeing significant performance degradation.  I never did actual
 measurements, but it was on the order of Bill, why is restoring taking
 such a long time? from other systems people.  At the time, I poked around
 and tried some stuff here and there and found that reindex restored
 performance.  I didn't look at actual size at that time.

A reindex might improve performance for reasons other than bloat --- to
wit, that a freshly-built index is in perfect physical order, which
tends to get degraded over time by page splits.  How important that is
depends on your usage patterns.  If this is what the story is for your
situation, then what might fix it (in 8.2) is to create the index with
FILLFACTOR 50 or so, so that it's already at the steady state density
and won't need many page splits.

 Anyway, I'll report back in a few weeks as to what the numbers look like.

Yeah, please for the moment just watch what happens with the default
behavior.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] How often do I need to reindex tables?

2007-02-28 Thread Ezequias Rodrigues da Rocha

This select doesn't return any row. What does it mean ?

Ezequias.

2007/2/27, Jim C. Nasby [EMAIL PROTECTED]:

On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote:
 I am planning to use 8.2 and the average inserts/deletes and updates
 across all tables is moderate. That is, it is a moderate sized
 database with moderate usage of tables.

 Given that, how often do I need to reindex the tables? Do I need to do
 it everyday?

No, you should very rarely if ever need to do it.

If you're really concerned, I suggest monitoring average tuples per
index page; something like

SELECT relname, reltuples/relpages FROM pg_class WHERE relkind = 'i' AND
relpages  1000;

That'll show tuples/page for all indexes over 8MB in size.

 Also with 8.2, I do not have to do vacuum anymore or that is what I
 understand. Does it do auto-vacuum?

You still need to enable autovacuum. See autovacuum_enable.
--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster




--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Atenciosamente (Sincerely)
   Ezequias Rodrigues da Rocha
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
A pior das democracias ainda é melhor do que a melhor das ditaduras
The worst of democracies is still better than the better of dictatorships
http://ezequiasrocha.blogspot.com/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] How often do I need to reindex tables?

2007-02-28 Thread Bill Moran
In response to Ezequias Rodrigues da Rocha [EMAIL PROTECTED]:
 2007/2/27, Jim C. Nasby [EMAIL PROTECTED]:
  On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote:
   I am planning to use 8.2 and the average inserts/deletes and updates
   across all tables is moderate. That is, it is a moderate sized
   database with moderate usage of tables.
  
   Given that, how often do I need to reindex the tables? Do I need to do
   it everyday?
 
  No, you should very rarely if ever need to do it.

I don't agree.  I think that regular indexing is mandatory under some
workloads.  Example:
bacula=# select relname, relpages from pg_class where relkind='i' and relname 
not like 'pg_%' order by relname;
relname| relpages 
---+--
 basefiles_pkey|1
 cdimages_pkey |1
 client_name_idx   |2
 client_pkey   |2
 counters_pkey |1
 device_pkey   |1
 file_fp_idx   |41212
[...]

bacula=# reindex database bacula;
[...]

relname| relpages 
---+--
 basefiles_pkey|1
 cdimages_pkey |1
 client_name_idx   |2
 client_pkey   |2
 counters_pkey |1
 device_pkey   |1
 file_fp_idx   |21367
[...]

There are some additional indexes that I've snipped from the output that also
saw some benefit from reindexing, but let's just focus on file_fp_idx.

Please note that the database you're looking at is reindexed _weekly_ by a
cron job, which means the index bloat you're seeing in the above example is
the result of normal activity since last Saturday.

I've brought this up before, and I want to point it out again.  I really
think there are certain workloads that require reindexing.  Luckily for
this particular workload, it's easy to schedule a job to do so, since I
know when the backups aren't running :)

-- 
Bill Moran
Collaborative Fusion Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] How often do I need to reindex tables?

2007-02-28 Thread Joshua D. Drake
Bill Moran wrote:
 In response to Ezequias Rodrigues da Rocha [EMAIL PROTECTED]:
 2007/2/27, Jim C. Nasby [EMAIL PROTECTED]:
 On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote:
 I am planning to use 8.2 and the average inserts/deletes and updates
 across all tables is moderate. That is, it is a moderate sized
 database with moderate usage of tables.

 Given that, how often do I need to reindex the tables? Do I need to do
 it everyday?
 No, you should very rarely if ever need to do it.
 
 I don't agree.  I think that regular indexing is mandatory under some
 workloads. 


Bill, you are right but I believe Jim was speaking from a general
perspective. Generally speaking you should not have to reindex, or if
you do very rarely.

I too have a couple of databases we manage that require a reindex more
often than what would be considered normal, but a reindex is far from
the norm itself.

Joshua D. Drake


 Example:
 bacula=# select relname, relpages from pg_class where relkind='i' and relname 
 not like 'pg_%' order by relname;
 relname| relpages 
 ---+--
  basefiles_pkey|1
  cdimages_pkey |1
  client_name_idx   |2
  client_pkey   |2
  counters_pkey |1
  device_pkey   |1
  file_fp_idx   |41212
 [...]
 
 bacula=# reindex database bacula;
 [...]
 
 relname| relpages 
 ---+--
  basefiles_pkey|1
  cdimages_pkey |1
  client_name_idx   |2
  client_pkey   |2
  counters_pkey |1
  device_pkey   |1
  file_fp_idx   |21367
 [...]
 
 There are some additional indexes that I've snipped from the output that also
 saw some benefit from reindexing, but let's just focus on file_fp_idx.
 
 Please note that the database you're looking at is reindexed _weekly_ by a
 cron job, which means the index bloat you're seeing in the above example is
 the result of normal activity since last Saturday.
 
 I've brought this up before, and I want to point it out again.  I really
 think there are certain workloads that require reindexing.  Luckily for
 this particular workload, it's easy to schedule a job to do so, since I
 know when the backups aren't running :)
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] How often do I need to reindex tables?

2007-02-28 Thread Tom Lane
Bill Moran [EMAIL PROTECTED] writes:
 I don't agree.  I think that regular indexing is mandatory under some
 workloads.  Example:
 ...
 There are some additional indexes that I've snipped from the output that also
 saw some benefit from reindexing, but let's just focus on file_fp_idx.

Can you describe the usage pattern of that index?  I'm curious why it
doesn't maintain reasonably static size.  How often is the underlying
table vacuumed?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] How often do I need to reindex tables?

2007-02-28 Thread Bill Moran
In response to Joshua D. Drake [EMAIL PROTECTED]:

 Bill Moran wrote:
  In response to Ezequias Rodrigues da Rocha [EMAIL PROTECTED]:
  2007/2/27, Jim C. Nasby [EMAIL PROTECTED]:
  On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote:
  I am planning to use 8.2 and the average inserts/deletes and updates
  across all tables is moderate. That is, it is a moderate sized
  database with moderate usage of tables.
 
  Given that, how often do I need to reindex the tables? Do I need to do
  it everyday?
  No, you should very rarely if ever need to do it.
  
  I don't agree.  I think that regular indexing is mandatory under some
  workloads. 
 
 Bill, you are right but I believe Jim was speaking from a general
 perspective. Generally speaking you should not have to reindex, or if
 you do very rarely.
 
 I too have a couple of databases we manage that require a reindex more
 often than what would be considered normal, but a reindex is far from
 the norm itself.

Well, I hope I didn't come across as confrontation or anything, as that
wasn't my intent.  The only point I was trying to make is that the need to
reindex probably shouldn't be written off lightly until one has monitored
the indexes for a spell to see if they need it or not.


  Example:
  bacula=# select relname, relpages from pg_class where relkind='i' and 
  relname not like 'pg_%' order by relname;
  relname| relpages 
  ---+--
   basefiles_pkey|1
   cdimages_pkey |1
   client_name_idx   |2
   client_pkey   |2
   counters_pkey |1
   device_pkey   |1
   file_fp_idx   |41212
  [...]
  
  bacula=# reindex database bacula;
  [...]
  
  relname| relpages 
  ---+--
   basefiles_pkey|1
   cdimages_pkey |1
   client_name_idx   |2
   client_pkey   |2
   counters_pkey |1
   device_pkey   |1
   file_fp_idx   |21367
  [...]
  
  There are some additional indexes that I've snipped from the output that 
  also
  saw some benefit from reindexing, but let's just focus on file_fp_idx.
  
  Please note that the database you're looking at is reindexed _weekly_ by a
  cron job, which means the index bloat you're seeing in the above example is
  the result of normal activity since last Saturday.
  
  I've brought this up before, and I want to point it out again.  I really
  think there are certain workloads that require reindexing.  Luckily for
  this particular workload, it's easy to schedule a job to do so, since I
  know when the backups aren't running :)
  
 
 
 -- 
 
   === The PostgreSQL Company: Command Prompt, Inc. ===
 Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
 Providing the most comprehensive  PostgreSQL solutions since 1997
  http://www.commandprompt.com/
 
 Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
 PostgreSQL Replication: http://www.commandprompt.com/products/
 
 
 
 
 
 
 


-- 
Bill Moran
Collaborative Fusion Inc.

[EMAIL PROTECTED]
Phone: 412-422-3463x4023


IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] How often do I need to reindex tables?

2007-02-28 Thread Bill Moran
In response to Tom Lane [EMAIL PROTECTED]:

 Bill Moran [EMAIL PROTECTED] writes:
  I don't agree.  I think that regular indexing is mandatory under some
  workloads.  Example:
  ...
  There are some additional indexes that I've snipped from the output that 
  also
  saw some benefit from reindexing, but let's just focus on file_fp_idx.
 
 Can you describe the usage pattern of that index?  I'm curious why it
 doesn't maintain reasonably static size.  How often is the underlying
 table vacuumed?

bacula=# \d file
 Table public.file
   Column   |  Type   |   Modifiers   
+-+---
 fileid | integer | not null default nextval('file_fileid_seq'::regclass)
 fileindex  | integer | not null default 0
 jobid  | integer | not null
 pathid | integer | not null
 filenameid | integer | not null
 markid | integer | not null default 0
 lstat  | text| not null
 md5| text| not null
Indexes:
file_pkey PRIMARY KEY, btree (fileid)
file_fp_idx btree (filenameid, pathid)
file_jobid_idx btree (jobid)

Now, that table stores a record for each file that is backed up (i.e. there's
a unique tuple for each time a file is backed up)  To save space in the
database, the file name and file path are stored in separate tables and
referenced by an ID.

This particular server has the following characteristics:
bacula=# select count(*) from file;
  count  
-
 8068956
(1 row)

bacula=# select count(*) from filename;
 count  

 151954
(1 row)

bacula=# select count(*) from path;
 count 
---
 49642
(1 row)

There are 21 jobs, each ranging in size from 2000 - 5000 files.  Each job
runs twice a day.  So you're looking at about 60,000 new rows at midnight
and 60,000 new rows at noon each day.  With the purge cycle, about the
same number of rows are being deleted as are being added, so the table
size stays pretty constant.

Because I know exactly when database activity is occurring on this system,
I have autovacuum disabled, and I manually run a vacuum analyze on this
database twice a day: once at 8:00 AM and again at 4:00 PM.  I had to bump
max_fsm_pages up to 6 to keep vacuum effective.

Note that the index under discussion is the only one in this database that
shows significant bloat.  I could probably just reindex that one on a
regular schedule, but since I know when the database is quiescent, there's
no reason I can think of not to reindex the whole thing.

Anything else I can provide that would be useful?

-- 
Bill Moran
Collaborative Fusion Inc.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] How often do I need to reindex tables?

2007-02-28 Thread Tom Lane
Bill Moran [EMAIL PROTECTED] writes:
 In response to Tom Lane [EMAIL PROTECTED]:
 Can you describe the usage pattern of that index?  I'm curious why it
 doesn't maintain reasonably static size.  How often is the underlying
 table vacuumed?
 ...
 There are 21 jobs, each ranging in size from 2000 - 5000 files.  Each job
 runs twice a day.  So you're looking at about 60,000 new rows at midnight
 and 60,000 new rows at noon each day.  With the purge cycle, about the
 same number of rows are being deleted as are being added, so the table
 size stays pretty constant.
 ...
 Note that the index under discussion is the only one in this database that
 shows significant bloat.

Yeah, and there's no obvious reason in what you say why this one should
bloat either.  Can you say anything about the distribution of the index
columns --- are you working with a fairly static set of filenameids, or
does that change over time?  How about the pathids?  How does the
combination of filenameid x pathid behave?

A bit of quick arithmetic says that the minimum possible size of that
index (at 100% fill factor) would be about 20K pages.  What you showed
us was that it had expanded to 40-some K pages, or a bit under 50% fill
factor.  This is low but not totally out of line; the traditional rule
of thumb is that the steady state fill factor will be about 2/3rds for a
heavily updated btree.  If you leave it go, does it continue to get
larger, or stay around 40K?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] How often do I need to reindex tables?

2007-02-28 Thread Bill Moran
In response to Tom Lane [EMAIL PROTECTED]:

 Bill Moran [EMAIL PROTECTED] writes:
  In response to Tom Lane [EMAIL PROTECTED]:
  Can you describe the usage pattern of that index?  I'm curious why it
  doesn't maintain reasonably static size.  How often is the underlying
  table vacuumed?
  ...
  There are 21 jobs, each ranging in size from 2000 - 5000 files.  Each job
  runs twice a day.  So you're looking at about 60,000 new rows at midnight
  and 60,000 new rows at noon each day.  With the purge cycle, about the
  same number of rows are being deleted as are being added, so the table
  size stays pretty constant.
  ...
  Note that the index under discussion is the only one in this database that
  shows significant bloat.
 
 Yeah, and there's no obvious reason in what you say why this one should
 bloat either.  Can you say anything about the distribution of the index
 columns --- are you working with a fairly static set of filenameids, or
 does that change over time?  How about the pathids?  How does the
 combination of filenameid x pathid behave?

My expectation would be that the distribution stays fairly constant and
that filenameids and pathids don't get added in any great number.  Most
of the servers that are being backed up are not going to see the file
names or paths change very much, just the contents of those files.

 A bit of quick arithmetic says that the minimum possible size of that
 index (at 100% fill factor) would be about 20K pages.  What you showed
 us was that it had expanded to 40-some K pages, or a bit under 50% fill
 factor.  This is low but not totally out of line; the traditional rule
 of thumb is that the steady state fill factor will be about 2/3rds for a
 heavily updated btree.  If you leave it go, does it continue to get
 larger, or stay around 40K?

I don't remember how big it was getting before I added that cron job.  I'll
remove the cron job and replace it with one that emails me the page size
of that index every week.  I'll let it go for a few weeks and see how
it manages.

-- 
Bill Moran
Collaborative Fusion Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] How often do I need to reindex tables?

2007-02-28 Thread Bill Moran
In response to Tom Lane [EMAIL PROTECTED]:

 Bill Moran [EMAIL PROTECTED] writes:
  In response to Tom Lane [EMAIL PROTECTED]:
  Can you describe the usage pattern of that index?  I'm curious why it
  doesn't maintain reasonably static size.  How often is the underlying
  table vacuumed?
  ...
  There are 21 jobs, each ranging in size from 2000 - 5000 files.  Each job
  runs twice a day.  So you're looking at about 60,000 new rows at midnight
  and 60,000 new rows at noon each day.  With the purge cycle, about the
  same number of rows are being deleted as are being added, so the table
  size stays pretty constant.
  ...
  Note that the index under discussion is the only one in this database that
  shows significant bloat.
 
 Yeah, and there's no obvious reason in what you say why this one should
 bloat either.  Can you say anything about the distribution of the index
 columns --- are you working with a fairly static set of filenameids, or
 does that change over time?  How about the pathids?  How does the
 combination of filenameid x pathid behave?
 
 A bit of quick arithmetic says that the minimum possible size of that
 index (at 100% fill factor) would be about 20K pages.  What you showed
 us was that it had expanded to 40-some K pages, or a bit under 50% fill
 factor.  This is low but not totally out of line; the traditional rule
 of thumb is that the steady state fill factor will be about 2/3rds for a
 heavily updated btree.  If you leave it go, does it continue to get
 larger, or stay around 40K?

Just an FYI ... I remembered what prompted the cron job.

We were seeing significant performance degradation.  I never did actual
measurements, but it was on the order of Bill, why is restoring taking
such a long time? from other systems people.  At the time, I poked around
and tried some stuff here and there and found that reindex restored
performance.  I didn't look at actual size at that time.

Anyway, I'll report back in a few weeks as to what the numbers look like.

-- 
Bill Moran
Collaborative Fusion Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] How often do I need to reindex tables?

2007-02-27 Thread Jim C. Nasby
On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote:
 I am planning to use 8.2 and the average inserts/deletes and updates
 across all tables is moderate. That is, it is a moderate sized
 database with moderate usage of tables.
 
 Given that, how often do I need to reindex the tables? Do I need to do
 it everyday?
 
No, you should very rarely if ever need to do it.

If you're really concerned, I suggest monitoring average tuples per
index page; something like

SELECT relname, reltuples/relpages FROM pg_class WHERE relkind = 'i' AND
relpages  1000;

That'll show tuples/page for all indexes over 8MB in size.

 Also with 8.2, I do not have to do vacuum anymore or that is what I
 understand. Does it do auto-vacuum?

You still need to enable autovacuum. See autovacuum_enable.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] How often do I need to reindex tables?

2007-02-27 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/27/07 13:26, Dhaval Shah wrote:
 I am planning to use 8.2 and the average inserts/deletes and updates
 across all tables is moderate. That is, it is a moderate sized
 database with moderate usage of tables.

Moderate?




-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFF5MVmS9HxQb37XmcRAu3PAJ9BwYSpuENbeJKweBn4arApxqyiKACgg8pg
1wExzokHE3tLSj5o4MjEaK4=
=GAEs
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly