I got a big improvement now by setting up a RAM-Disk for /tmpDB and  
some minor changes to mysql.
Instead of > 3.400 seconds for 4.448 files in /errors/spam, now the result is:

Nov-22-11 18:30:40 /usr/share/assp/errors/spam
Nov-22-11 18:30:40 File Count:  4,454
Nov-22-11 18:30:40 Processing... errors/spam with 4454 files
Nov-22-11 18:35:31 Imported Files:      4,454
Nov-22-11 18:35:31 Finished in 291 second(s)

Completing rebuildspam with HMM now takes ~ 32 Minutes :-))

Thanks Thomas for the hint! Is the time now much closer to yours?

Regards
Michael


For information the complete log:



Nov-22-11 18:30:40 RebuildSpamDB-thread rebuildspamdb-version 3.10  
started in ASSP version 2.1.2(11321)

Nov-22-11 18:30:40 RebuildSpamDB will create a Hidden Markov Model!

Nov-22-11 18:30:40 ---ASSP Settings---
Nov-22-11 18:30:40 Do Not Collect RedRe Messages: Enabled
**Messages matching the RedRe will be removed from the corpus!**

Nov-22-11 18:30:40 Use Subject as Maillog Names: False
Nov-22-11 18:30:40 Maxbytes: 4000

Nov-22-11 18:30:40 remove /usr/share/assp/spam/13608.eml Trashlist
Nov-22-11 18:30:40 Trashlist cleaning finished, 1 of 70 files deleted

Nov-22-11 18:30:40 /usr/share/assp/errors/spam
Nov-22-11 18:30:40 File Count:  4,454
Nov-22-11 18:30:40 Processing... errors/spam with 4454 files
Nov-22-11 18:35:31 Imported Files:      4,454
Nov-22-11 18:35:31 Finished in 291 second(s)

Nov-22-11 18:35:31 /usr/share/assp/errors/notspam
Nov-22-11 18:35:31 File Count:  1,089
Nov-22-11 18:35:31 Processing... errors/notspam with 1089 files
Nov-22-11 18:37:10 Imported Files:      1,089
Nov-22-11 18:37:10 Finished in 99 second(s)

Nov-22-11 18:37:10 /usr/share/assp/spam
Nov-22-11 18:37:10 File Count:  5,897
Nov-22-11 18:37:10 Processing... spam with 5897 files
Nov-22-11 18:42:17 Imported Files:      5,897
Nov-22-11 18:42:17 Finished in 307 second(s)

Nov-22-11 18:42:17 /usr/share/assp/notspam
Nov-22-11 18:42:17 File Count:  17,930
Nov-22-11 18:42:17 Processing... notspam with 17930 files
Nov-22-11 18:54:31 Removed Old: 1
Nov-22-11 18:54:31 Imported Files:      17,929
Nov-22-11 18:54:31 Finished in 734 second(s)

Nov-22-11 18:54:31 Generating weighted Bayesian tuplets
Nov-22-11 19:00:55 cleaning old Spamdb records
Nov-22-11 19:01:50 done - cleaning old Spamdb records - removed 40 from 291431
Nov-22-11 19:01:50 done - Generating weighted Bayesian tuplets

Nov-22-11 19:01:51 Bayesian Pairs: 291,431 new, 291,431 now in list
Nov-22-11 19:01:51 generating Spamdb.helo records from 3641 collected HELO's
Nov-22-11 19:01:52 cleaning old Spamdb.helo records
Nov-22-11 19:01:52 done - cleaning old Spamdb.helo records

Nov-22-11 19:01:52 HELO Blacklist: 0 new, 189 now in list

Nov-22-11 19:01:52 Spam Weight:    7,446,449
Nov-22-11 19:01:52 Not-Spam Weight:   8,632,732

Nov-22-11 19:01:52 Corpus norm: 0.8626 - (ok - slighly ham heavy)
Nov-22-11 19:01:52 Corpus confidence:   0.87918513

Nov-22-11 19:01:57 Start populating Hidden Markov Model. HMM-check is  
disabled for this time!
Nov-22-11 19:01:57 start populating Hidden Markov Model ham chains  
with 1743846 records!
Nov-22-11 19:02:21 Finished populating Hidden Markov Model ham chains  
with 1743846 records!
Nov-22-11 19:02:21 start populating Hidden Markov Model ham totals  
with 1540715 records!
Nov-22-11 19:02:43 Finished populating Hidden Markov Model ham totals  
with 1540715 records!
Nov-22-11 19:02:43 start populating Hidden Markov Model spam chains  
with 1806213 records!
Nov-22-11 19:03:10 Finished populating Hidden Markov Model spam chains  
with 1806213 records!
Nov-22-11 19:03:10 start populating Hidden Markov Model spam totals  
with 1637647 records!
Nov-22-11 19:03:33 Finished populating Hidden Markov Model spam totals  
with 1637647 records!
Nov-22-11 19:03:33 Finished populating Hidden Markov Model. HMM-check  
is now enabled again!

Nov-22-11 19:03:33 Total processing time: 1973 second(s)

Nov-22-11 19:03:33 Total processing data: 431.30 MByte

Nov-22-11 19:03:33 building new GripList records and bounce report
Nov-22-11 19:03:33 processing Logfile /usr/share/assp/logs/maillog.txt
Nov-22-11 19:03:44 processing Logfile  
/usr/share/assp/logs/11-11-21.maillog.txt
Nov-22-11 19:04:14 processing Logfile  
/usr/share/assp/logs/11-11-20.maillog.txt
Nov-22-11 19:04:24 processing Logfile  
/usr/share/assp/logs/11-11-19.maillog.txt
Nov-22-11 19:04:40 processing Logfile  
/usr/share/assp/logs/11-11-18.maillog.txt

Nov-22-11 19:04:41 bounce report for the last two days: no bounces received

Nov-22-11 19:04:42 Uploading Griplist via Direct Connection
Nov-22-11 19:04:47 Submitted 4566 bytes: 0 IPv6 addresses, 506 IPv4 addresses

Nov-22-11 19:04:47 Trashlist was saved to /usr/share/assp/trashlist.db





------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to