On Thu, Dec 17, 2009 at 12:40:27AM +0200, Ibrahim Harrani wrote:
> Hi,
> 
> I have 5000 ham and 15 000 spam mails. I would like to train it before using
> dspam.
> I usually train spam mail first then ham mail with the following commands.
> Is it safe to train dspam like this.
> or do I have to use dspam_train script?
> If I remember correctly i used dspam_train but the script was learning spam
> mails as ham because of not enough pattern?
> 
> 
> for spam:
> for i in `/usr/bin/find /home/spam/ -type f`
>         do
>                 echo $i; sudo /usr/local/bin/dspam --client --user
> myglobaluser --class=spam --source=corpus --mode=teft < $i
> 
>       done
> 
> For ham mails:
> 
> for i in `/usr/bin/find /home/ham/ -type f`
>         do
>                 echo $i; sudo /usr/local/bin/dspam --client --user
> myglobaluser --class=ham--source=corpus --mode=teft < $i
> 
>       done
> 
> PS: I started testing dspam 3.9RC2 on FreeBSD with PostgreSQL driver
> support.

I just took a look at the dspam 3.9RC2 tools.pgsql schema definition
and it is curiously lacking an index on token data. I would recommend
something with (uid,token). Also adjust the fillfactor to allow HOT
updates to the table. Finally a CLUSTER using the uid,token index
should help locality of reference.

Regards,
Ken

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to