Hi Michael

Can you share exact what you did to improve the rebuildspam speed.


Thanks in advance

/A




 



From:   Michael <[email protected]>
To:     ASSP development mailing list <[email protected]>
Date:   2011-11-22 19:14
Subject:        [Assp-test] SOLVED: best config for rebuildspam and HMM




I got a big improvement now by setting up a RAM-Disk for /tmpDB and 
some minor changes to mysql.
Instead of > 3.400 seconds for 4.448 files in /errors/spam, now the result 
is:

Nov-22-11 18:30:40 /usr/share/assp/errors/spam
Nov-22-11 18:30:40 File Count:           4,454
Nov-22-11 18:30:40 Processing... errors/spam with 4454 files
Nov-22-11 18:35:31 Imported Files:               4,454
Nov-22-11 18:35:31 Finished in 291 second(s)

Completing rebuildspam with HMM now takes ~ 32 Minutes :-))

Thanks Thomas for the hint! Is the time now much closer to yours?

Regards
Michael


For information the complete log:



Nov-22-11 18:30:40 RebuildSpamDB-thread rebuildspamdb-version 3.10 
started in ASSP version 2.1.2(11321)

Nov-22-11 18:30:40 RebuildSpamDB will create a Hidden Markov Model!

Nov-22-11 18:30:40 ---ASSP Settings---
Nov-22-11 18:30:40 Do Not Collect RedRe Messages: Enabled
**Messages matching the RedRe will be removed from the corpus!**

Nov-22-11 18:30:40 Use Subject as Maillog Names: False
Nov-22-11 18:30:40 Maxbytes: 4000

Nov-22-11 18:30:40 remove /usr/share/assp/spam/13608.eml Trashlist
Nov-22-11 18:30:40 Trashlist cleaning finished, 1 of 70 files deleted

Nov-22-11 18:30:40 /usr/share/assp/errors/spam
Nov-22-11 18:30:40 File Count:           4,454
Nov-22-11 18:30:40 Processing... errors/spam with 4454 files
Nov-22-11 18:35:31 Imported Files:               4,454
Nov-22-11 18:35:31 Finished in 291 second(s)

Nov-22-11 18:35:31 /usr/share/assp/errors/notspam
Nov-22-11 18:35:31 File Count:           1,089
Nov-22-11 18:35:31 Processing... errors/notspam with 1089 files
Nov-22-11 18:37:10 Imported Files:               1,089
Nov-22-11 18:37:10 Finished in 99 second(s)

Nov-22-11 18:37:10 /usr/share/assp/spam
Nov-22-11 18:37:10 File Count:           5,897
Nov-22-11 18:37:10 Processing... spam with 5897 files
Nov-22-11 18:42:17 Imported Files:               5,897
Nov-22-11 18:42:17 Finished in 307 second(s)

Nov-22-11 18:42:17 /usr/share/assp/notspam
Nov-22-11 18:42:17 File Count:           17,930
Nov-22-11 18:42:17 Processing... notspam with 17930 files
Nov-22-11 18:54:31 Removed Old:          1
Nov-22-11 18:54:31 Imported Files:               17,929
Nov-22-11 18:54:31 Finished in 734 second(s)

Nov-22-11 18:54:31 Generating weighted Bayesian tuplets
Nov-22-11 19:00:55 cleaning old Spamdb records
Nov-22-11 19:01:50 done - cleaning old Spamdb records - removed 40 from 
291431
Nov-22-11 19:01:50 done - Generating weighted Bayesian tuplets

Nov-22-11 19:01:51 Bayesian Pairs: 291,431 new, 291,431 now in list
Nov-22-11 19:01:51 generating Spamdb.helo records from 3641 collected 
HELO's
Nov-22-11 19:01:52 cleaning old Spamdb.helo records
Nov-22-11 19:01:52 done - cleaning old Spamdb.helo records

Nov-22-11 19:01:52 HELO Blacklist: 0 new, 189 now in list

Nov-22-11 19:01:52 Spam Weight:             7,446,449
Nov-22-11 19:01:52 Not-Spam Weight:   8,632,732

Nov-22-11 19:01:52 Corpus norm:          0.8626 - (ok - slighly ham heavy)
Nov-22-11 19:01:52 Corpus confidence:            0.87918513

Nov-22-11 19:01:57 Start populating Hidden Markov Model. HMM-check is 
disabled for this time!
Nov-22-11 19:01:57 start populating Hidden Markov Model ham chains 
with 1743846 records!
Nov-22-11 19:02:21 Finished populating Hidden Markov Model ham chains 
with 1743846 records!
Nov-22-11 19:02:21 start populating Hidden Markov Model ham totals 
with 1540715 records!
Nov-22-11 19:02:43 Finished populating Hidden Markov Model ham totals 
with 1540715 records!
Nov-22-11 19:02:43 start populating Hidden Markov Model spam chains 
with 1806213 records!
Nov-22-11 19:03:10 Finished populating Hidden Markov Model spam chains 
with 1806213 records!
Nov-22-11 19:03:10 start populating Hidden Markov Model spam totals 
with 1637647 records!
Nov-22-11 19:03:33 Finished populating Hidden Markov Model spam totals 
with 1637647 records!
Nov-22-11 19:03:33 Finished populating Hidden Markov Model. HMM-check 
is now enabled again!

Nov-22-11 19:03:33 Total processing time: 1973 second(s)

Nov-22-11 19:03:33 Total processing data: 431.30 MByte

Nov-22-11 19:03:33 building new GripList records and bounce report
Nov-22-11 19:03:33 processing Logfile /usr/share/assp/logs/maillog.txt
Nov-22-11 19:03:44 processing Logfile 
/usr/share/assp/logs/11-11-21.maillog.txt
Nov-22-11 19:04:14 processing Logfile 
/usr/share/assp/logs/11-11-20.maillog.txt
Nov-22-11 19:04:24 processing Logfile 
/usr/share/assp/logs/11-11-19.maillog.txt
Nov-22-11 19:04:40 processing Logfile 
/usr/share/assp/logs/11-11-18.maillog.txt

Nov-22-11 19:04:41 bounce report for the last two days: no bounces 
received

Nov-22-11 19:04:42 Uploading Griplist via Direct Connection
Nov-22-11 19:04:47 Submitted 4566 bytes: 0 IPv6 addresses, 506 IPv4 
addresses

Nov-22-11 19:04:47 Trashlist was saved to /usr/share/assp/trashlist.db





------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test


_________________________________________________________________________________________________________________________________________________
NOTICE: This email and any attachments are for the sole use of the intended 
recipient(s) and may contain confidential and privileged information.  
Any unauthorized review, use, disclosure or distribution is prohibited.  
If you are not the intended recipient, please notify the sender by reply email 
and destroy the original message.
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to