Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

Richard Broersma Jr Wed, 19 Sep 2007 07:32:52 -0700

--- Phoenix Kiula <[EMAIL PROTECTED]> wrote:

> Then I am confused again about how the fill factor stuff works. Let's
> say I have a table with four BTREE indexes. Should all of them have a
> fill factor of about 60 (lower than the default 90, that is) to be
> effective? Or will it help if I lower the fill factor on only a couple
> of the most often used ones? The primary key index is very, very
> rarely updated so I don't need it to have a fill factor.
> 
> I could try and see these one by one, but that's no better than
> touching/feeling the database blind-folded. I would love to see some
> writeup about this whole thing, but it seems hard to come by!


I will try to explain everything that I understand about indexes and tables.  I 
am sure that if
some part of my understanding is incorrect, someone will chime in to correct me.

In PostgreSQL, tables are physically distinct from indexes.  This means that 
any give table is
written to disk as a file(s), and indexes are also written to disk as a 
separate file(s).  A table
and index are both saved to disk in segmented block referred to a pages (I 
believe the default
size is 8K).  

The advantage of the index file is that it is significantly smaller in size, so 
it takes less time
to sequentially scan and less time to read from disk.

Now when you want to find a record, PostgreSQL will/(may choose to) 
sequentially scan the index
until it find the record entry that corresponds with your criteria.  This entry 
has a table
cross-reference to the actual page that contains the record that is "pointed" 
at by the index.
Lastly, the entire table page containing your record is read from disk to 
memory for further query
processing.

<ASSUMPTION>
When you insert a record into a table that generates an entry into the b-tree 
index file,
PostgreSQL will scan the pages of the index file to find the correct place and 
index page to add
this entry.  If the page is already full, PostgreSQL "probably" replaces the 
old full pages with
two new pages with a distribution of that chunk of the B-tree index, and then 
adds the new entry
to one of those pages. This operation can become very expensive if many new 
pages need to be
created from single INSERT/UPDATE statement.
</ASSUMPTION>

By using fill factor, you are telling PostgreSQL to automatically leave a 
portion of any newly
created index page partially blank for future use.  When a newly created index 
entry needs to be
saved, it can be stored in one of the "holes" left in the index page.  

A large fill factor create both advantages and dis-advantages.  For writing 
operations, it is a
big advantage because, a large fill factor will leave alot of holes for INSERT 
and UPDATE
operations to use.  This can help increase the number of UPDATE/INSERT per 
seconds that you server
can handle since, they index pages do not have to be rebuilt very often.

However, the disadvantage is that, a newly created index with a large fill 
factor has "index
bloat" designed into it.  This mean that the index pages have a large portion 
of holes.  So
PostgreSQL will create more index pages than it would normally in order to hold 
both your index
and the pre-defined holes that you specified by your fill-factor.  Larger 
indexes require more
time to be read from disk to memory and will require more time to sequentially 
scan to find to
find the cross-reference table page location of records of interest.  So the 
net effect is that
larger indexes will make SELECT statement slower.

This is my understanding for tables indexes and fill factor. I hope it helps.

Regards,
Richard Broersma Jr.

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

Reply via email to