On Thu, Dec 17, 2009 at 12:40:27AM +0200, Ibrahim Harrani wrote: > Hi, > > I have 5000 ham and 15 000 spam mails. I would like to train it before using > dspam. > I usually train spam mail first then ham mail with the following commands. > Is it safe to train dspam like this. > or do I have to use dspam_train script? > If I remember correctly i used dspam_train but the script was learning spam > mails as ham because of not enough pattern? > > > for spam: > for i in `/usr/bin/find /home/spam/ -type f` > do > echo $i; sudo /usr/local/bin/dspam --client --user > myglobaluser --class=spam --source=corpus --mode=teft < $i > > done > > For ham mails: > > for i in `/usr/bin/find /home/ham/ -type f` > do > echo $i; sudo /usr/local/bin/dspam --client --user > myglobaluser --class=ham--source=corpus --mode=teft < $i > > done > > PS: I started testing dspam 3.9RC2 on FreeBSD with PostgreSQL driver > support.
I just took a look at the dspam 3.9RC2 tools.pgsql schema definition and it is curiously lacking an index on token data. I would recommend something with (uid,token). Also adjust the fillfactor to allow HOT updates to the table. Finally a CLUSTER using the uid,token index should help locality of reference. Regards, Ken ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Dspam-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspam-user
