On Sat, Aug 09, 2008 at 06:19:32PM +0300, Ibrahim Harrani wrote:
> Hi Kenneth,
> 
> Thanks for your prompt reply.
> My answers are inline
> 
> 
> > We have been running DSPAM since the 3.2 days and have wanted to use
> > PostgreSQL instead of MySQL for many reasons. Both DSPAM and PostgreSQL
> > had performance issues that needed to be addressed for this combination
> > to work well in a high load environment. First, you need to be running
> > the latest version of DSPAM, in particular the PG driver should define
> > two functions for lookup_token, one with 1 arg and a second with 2 args.
> I am using cvs version.
> 
> > Second, you need to be running PostgreSQL 8.3.
> 
> # SELECT version();
>                                              version
> -------------------------------------------------------------------------------------------------
>  PostgreSQL 8.3.0 on amd64-portbld-freebsd6.3, compiled by GCC cc
> (GCC) 3.4.6 [FreeBSD] 20060305
> (1 row)
> 
> 
>  Third, you need to
> > setup your tables/indexes to use a < 100% fill-factor, this will allow
> > HOT updates to work.
> 
> # ALTER TABLE dspam_token_data SET ( FILLFACTOR = 100 );
> 
You need to use a value for fillfactor that is less than 100%, otherwise any
update will be forced to use a new data page at the end of the table file
which will randomize your I/O over time.

> 
>  Fourth, cluster your token table on userid, this
> > will localize your disk I/O and HOT will keep it localized.
> 
> # CLUSTER dspam_token_data USING dspam_token_data_uid_key ;
> 
>  I think I should CLUSTER the table regularly right? like following:
> 
> =# CLUSTER dspam_token_data   ;
> 
If you have HOT working correctly, you will need to run CLUSTER much
less frequently, possibly once-a-month or less.

> 
> Fifth and
> > last, you need to remove any index on the fields that are to be updated
> > in the token table. A typical on is on the spam count + not spam count
> > to help expire tokends. If an index exists, then HOT will not be able
> > to function.
> 
> Fifth is not clear?
> # \d dspam_token_data
>    Table "public.dspam_token_data"
>     Column     |   Type   | Modifiers
> ---------------+----------+-----------
>  uid           | smallint |
>  token         | bigint   |
>  spam_hits     | integer  |
>  innocent_hits | integer  |
>  last_hit      | date     |
> Indexes:
>     "dspam_token_data_uid_key" UNIQUE, btree (uid, token) CLUSTER
>     "id_token_data_sumhits" btree ((spam_hits + innocent_hits))
> 
> 
> Are you talking about "id_token_data_sumhits" index on the table?
> Should I drop it?

Yes, that is the one to drop. The HOT update will only happen if the
value being updated will not require an update to an index. In this
case, since when a token is identified the spam_hits and innocent_hits
values are updated depending on whether the message was identified as
spam or not. Either way, the update would force an update to the index
id_token_data_sumhits which would disable the hot updates.

Ken

!DSPAM:1011,489db991150922758512159!


Reply via email to