Hi Michael Can you share exact what you did to improve the rebuildspam speed.
Thanks in advance /A From: Michael <[email protected]> To: ASSP development mailing list <[email protected]> Date: 2011-11-22 19:14 Subject: [Assp-test] SOLVED: best config for rebuildspam and HMM I got a big improvement now by setting up a RAM-Disk for /tmpDB and some minor changes to mysql. Instead of > 3.400 seconds for 4.448 files in /errors/spam, now the result is: Nov-22-11 18:30:40 /usr/share/assp/errors/spam Nov-22-11 18:30:40 File Count: 4,454 Nov-22-11 18:30:40 Processing... errors/spam with 4454 files Nov-22-11 18:35:31 Imported Files: 4,454 Nov-22-11 18:35:31 Finished in 291 second(s) Completing rebuildspam with HMM now takes ~ 32 Minutes :-)) Thanks Thomas for the hint! Is the time now much closer to yours? Regards Michael For information the complete log: Nov-22-11 18:30:40 RebuildSpamDB-thread rebuildspamdb-version 3.10 started in ASSP version 2.1.2(11321) Nov-22-11 18:30:40 RebuildSpamDB will create a Hidden Markov Model! Nov-22-11 18:30:40 ---ASSP Settings--- Nov-22-11 18:30:40 Do Not Collect RedRe Messages: Enabled **Messages matching the RedRe will be removed from the corpus!** Nov-22-11 18:30:40 Use Subject as Maillog Names: False Nov-22-11 18:30:40 Maxbytes: 4000 Nov-22-11 18:30:40 remove /usr/share/assp/spam/13608.eml Trashlist Nov-22-11 18:30:40 Trashlist cleaning finished, 1 of 70 files deleted Nov-22-11 18:30:40 /usr/share/assp/errors/spam Nov-22-11 18:30:40 File Count: 4,454 Nov-22-11 18:30:40 Processing... errors/spam with 4454 files Nov-22-11 18:35:31 Imported Files: 4,454 Nov-22-11 18:35:31 Finished in 291 second(s) Nov-22-11 18:35:31 /usr/share/assp/errors/notspam Nov-22-11 18:35:31 File Count: 1,089 Nov-22-11 18:35:31 Processing... errors/notspam with 1089 files Nov-22-11 18:37:10 Imported Files: 1,089 Nov-22-11 18:37:10 Finished in 99 second(s) Nov-22-11 18:37:10 /usr/share/assp/spam Nov-22-11 18:37:10 File Count: 5,897 Nov-22-11 18:37:10 Processing... spam with 5897 files Nov-22-11 18:42:17 Imported Files: 5,897 Nov-22-11 18:42:17 Finished in 307 second(s) Nov-22-11 18:42:17 /usr/share/assp/notspam Nov-22-11 18:42:17 File Count: 17,930 Nov-22-11 18:42:17 Processing... notspam with 17930 files Nov-22-11 18:54:31 Removed Old: 1 Nov-22-11 18:54:31 Imported Files: 17,929 Nov-22-11 18:54:31 Finished in 734 second(s) Nov-22-11 18:54:31 Generating weighted Bayesian tuplets Nov-22-11 19:00:55 cleaning old Spamdb records Nov-22-11 19:01:50 done - cleaning old Spamdb records - removed 40 from 291431 Nov-22-11 19:01:50 done - Generating weighted Bayesian tuplets Nov-22-11 19:01:51 Bayesian Pairs: 291,431 new, 291,431 now in list Nov-22-11 19:01:51 generating Spamdb.helo records from 3641 collected HELO's Nov-22-11 19:01:52 cleaning old Spamdb.helo records Nov-22-11 19:01:52 done - cleaning old Spamdb.helo records Nov-22-11 19:01:52 HELO Blacklist: 0 new, 189 now in list Nov-22-11 19:01:52 Spam Weight: 7,446,449 Nov-22-11 19:01:52 Not-Spam Weight: 8,632,732 Nov-22-11 19:01:52 Corpus norm: 0.8626 - (ok - slighly ham heavy) Nov-22-11 19:01:52 Corpus confidence: 0.87918513 Nov-22-11 19:01:57 Start populating Hidden Markov Model. HMM-check is disabled for this time! Nov-22-11 19:01:57 start populating Hidden Markov Model ham chains with 1743846 records! Nov-22-11 19:02:21 Finished populating Hidden Markov Model ham chains with 1743846 records! Nov-22-11 19:02:21 start populating Hidden Markov Model ham totals with 1540715 records! Nov-22-11 19:02:43 Finished populating Hidden Markov Model ham totals with 1540715 records! Nov-22-11 19:02:43 start populating Hidden Markov Model spam chains with 1806213 records! Nov-22-11 19:03:10 Finished populating Hidden Markov Model spam chains with 1806213 records! Nov-22-11 19:03:10 start populating Hidden Markov Model spam totals with 1637647 records! Nov-22-11 19:03:33 Finished populating Hidden Markov Model spam totals with 1637647 records! Nov-22-11 19:03:33 Finished populating Hidden Markov Model. HMM-check is now enabled again! Nov-22-11 19:03:33 Total processing time: 1973 second(s) Nov-22-11 19:03:33 Total processing data: 431.30 MByte Nov-22-11 19:03:33 building new GripList records and bounce report Nov-22-11 19:03:33 processing Logfile /usr/share/assp/logs/maillog.txt Nov-22-11 19:03:44 processing Logfile /usr/share/assp/logs/11-11-21.maillog.txt Nov-22-11 19:04:14 processing Logfile /usr/share/assp/logs/11-11-20.maillog.txt Nov-22-11 19:04:24 processing Logfile /usr/share/assp/logs/11-11-19.maillog.txt Nov-22-11 19:04:40 processing Logfile /usr/share/assp/logs/11-11-18.maillog.txt Nov-22-11 19:04:41 bounce report for the last two days: no bounces received Nov-22-11 19:04:42 Uploading Griplist via Direct Connection Nov-22-11 19:04:47 Submitted 4566 bytes: 0 IPv6 addresses, 506 IPv4 addresses Nov-22-11 19:04:47 Trashlist was saved to /usr/share/assp/trashlist.db ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Assp-test mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/assp-test _________________________________________________________________________________________________________________________________________________ NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please notify the sender by reply email and destroy the original message. ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Assp-test mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/assp-test
