I got a big improvement now by setting up a RAM-Disk for /tmpDB and some minor changes to mysql. Instead of > 3.400 seconds for 4.448 files in /errors/spam, now the result is:
Nov-22-11 18:30:40 /usr/share/assp/errors/spam Nov-22-11 18:30:40 File Count: 4,454 Nov-22-11 18:30:40 Processing... errors/spam with 4454 files Nov-22-11 18:35:31 Imported Files: 4,454 Nov-22-11 18:35:31 Finished in 291 second(s) Completing rebuildspam with HMM now takes ~ 32 Minutes :-)) Thanks Thomas for the hint! Is the time now much closer to yours? Regards Michael For information the complete log: Nov-22-11 18:30:40 RebuildSpamDB-thread rebuildspamdb-version 3.10 started in ASSP version 2.1.2(11321) Nov-22-11 18:30:40 RebuildSpamDB will create a Hidden Markov Model! Nov-22-11 18:30:40 ---ASSP Settings--- Nov-22-11 18:30:40 Do Not Collect RedRe Messages: Enabled **Messages matching the RedRe will be removed from the corpus!** Nov-22-11 18:30:40 Use Subject as Maillog Names: False Nov-22-11 18:30:40 Maxbytes: 4000 Nov-22-11 18:30:40 remove /usr/share/assp/spam/13608.eml Trashlist Nov-22-11 18:30:40 Trashlist cleaning finished, 1 of 70 files deleted Nov-22-11 18:30:40 /usr/share/assp/errors/spam Nov-22-11 18:30:40 File Count: 4,454 Nov-22-11 18:30:40 Processing... errors/spam with 4454 files Nov-22-11 18:35:31 Imported Files: 4,454 Nov-22-11 18:35:31 Finished in 291 second(s) Nov-22-11 18:35:31 /usr/share/assp/errors/notspam Nov-22-11 18:35:31 File Count: 1,089 Nov-22-11 18:35:31 Processing... errors/notspam with 1089 files Nov-22-11 18:37:10 Imported Files: 1,089 Nov-22-11 18:37:10 Finished in 99 second(s) Nov-22-11 18:37:10 /usr/share/assp/spam Nov-22-11 18:37:10 File Count: 5,897 Nov-22-11 18:37:10 Processing... spam with 5897 files Nov-22-11 18:42:17 Imported Files: 5,897 Nov-22-11 18:42:17 Finished in 307 second(s) Nov-22-11 18:42:17 /usr/share/assp/notspam Nov-22-11 18:42:17 File Count: 17,930 Nov-22-11 18:42:17 Processing... notspam with 17930 files Nov-22-11 18:54:31 Removed Old: 1 Nov-22-11 18:54:31 Imported Files: 17,929 Nov-22-11 18:54:31 Finished in 734 second(s) Nov-22-11 18:54:31 Generating weighted Bayesian tuplets Nov-22-11 19:00:55 cleaning old Spamdb records Nov-22-11 19:01:50 done - cleaning old Spamdb records - removed 40 from 291431 Nov-22-11 19:01:50 done - Generating weighted Bayesian tuplets Nov-22-11 19:01:51 Bayesian Pairs: 291,431 new, 291,431 now in list Nov-22-11 19:01:51 generating Spamdb.helo records from 3641 collected HELO's Nov-22-11 19:01:52 cleaning old Spamdb.helo records Nov-22-11 19:01:52 done - cleaning old Spamdb.helo records Nov-22-11 19:01:52 HELO Blacklist: 0 new, 189 now in list Nov-22-11 19:01:52 Spam Weight: 7,446,449 Nov-22-11 19:01:52 Not-Spam Weight: 8,632,732 Nov-22-11 19:01:52 Corpus norm: 0.8626 - (ok - slighly ham heavy) Nov-22-11 19:01:52 Corpus confidence: 0.87918513 Nov-22-11 19:01:57 Start populating Hidden Markov Model. HMM-check is disabled for this time! Nov-22-11 19:01:57 start populating Hidden Markov Model ham chains with 1743846 records! Nov-22-11 19:02:21 Finished populating Hidden Markov Model ham chains with 1743846 records! Nov-22-11 19:02:21 start populating Hidden Markov Model ham totals with 1540715 records! Nov-22-11 19:02:43 Finished populating Hidden Markov Model ham totals with 1540715 records! Nov-22-11 19:02:43 start populating Hidden Markov Model spam chains with 1806213 records! Nov-22-11 19:03:10 Finished populating Hidden Markov Model spam chains with 1806213 records! Nov-22-11 19:03:10 start populating Hidden Markov Model spam totals with 1637647 records! Nov-22-11 19:03:33 Finished populating Hidden Markov Model spam totals with 1637647 records! Nov-22-11 19:03:33 Finished populating Hidden Markov Model. HMM-check is now enabled again! Nov-22-11 19:03:33 Total processing time: 1973 second(s) Nov-22-11 19:03:33 Total processing data: 431.30 MByte Nov-22-11 19:03:33 building new GripList records and bounce report Nov-22-11 19:03:33 processing Logfile /usr/share/assp/logs/maillog.txt Nov-22-11 19:03:44 processing Logfile /usr/share/assp/logs/11-11-21.maillog.txt Nov-22-11 19:04:14 processing Logfile /usr/share/assp/logs/11-11-20.maillog.txt Nov-22-11 19:04:24 processing Logfile /usr/share/assp/logs/11-11-19.maillog.txt Nov-22-11 19:04:40 processing Logfile /usr/share/assp/logs/11-11-18.maillog.txt Nov-22-11 19:04:41 bounce report for the last two days: no bounces received Nov-22-11 19:04:42 Uploading Griplist via Direct Connection Nov-22-11 19:04:47 Submitted 4566 bytes: 0 IPv6 addresses, 506 IPv4 addresses Nov-22-11 19:04:47 Trashlist was saved to /usr/share/assp/trashlist.db ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Assp-test mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/assp-test
