- restore the corpus
- remove the file 'normfile'
- set 'RebuildTestMode' to on
- run a rebuild
- run a rebuild
Thomas
Von: Steve Moffat <st...@optimum.bm>
An: "'assp-test@lists.sourceforge.net'"
<assp-test@lists.sourceforge.net>,
Datum: 12.09.2012 18:51
Betreff: [Assp-test] FW: RebuildSpamDB - report from assp.isp.bm
Hi, Just ran rebuildspamdb with the new release. The results are even
worse.....before this I had a perfect corpus....
Steve
-----Original Message-----
From: assp@assp.local [mailto:assp@assp.local]
Sent: Wednesday, September 12, 2012 1:50 PM
To: Steve Moffat
Subject: RebuildSpamDB - report from assp.isp.bm
File rebuildrun.txt follows:
Sep-12-12 13:18:25 RebuildSpamDB-thread rebuildspamdb-version 6.02 started
in ASSP version 2.2.2(12256)
Sep-12-12 13:18:25 RebuildSpamDB will create a Hidden Markov Model!
Sep-12-12 13:18:25 RebuildSpamDB will include attachment-database-entries
in to spamdb!
Sep-12-12 13:18:25 RebuildSpamDB will create unicode enabled databases.
Sep-12-12 13:18:25 RebuildSpamDB process all words as Sequence of UAX #29
Grapheme Clusters.
Sep-12-12 13:18:25 RebuildSpamDB will use the ASSP_WordStem engine.
Sep-12-12 13:18:25 ---ASSP Settings---
Sep-12-12 13:18:25 Do Not Collect RedRe Messages: Enabled **Messages
matching the RedRe will be removed from the corpus!**
Sep-12-12 13:18:25 Use Subject as Maillog Names: True
Sep-12-12 13:18:25 Maxbytes: 4000
Sep-12-12 13:18:25 RebuildFileTimeLimit: 1 5
Sep-12-12 13:18:25 RebuildFileTimeLimit: files will be moved away from the
corpus, if there processing takes longer than 5 second(s)
Sep-12-12 13:18:25 C:/assp/errors/spam
Sep-12-12 13:18:25 File Count: 319
Sep-12-12 13:18:25 Processing... errors/spam with 319 files
Sep-12-12 13:18:25 ignore and remove files older than Dec-17-09 12:18:25
in folder errors/spam
Sep-12-12 13:18:33 1 attachment/image entries processed
Sep-12-12 13:18:33 Imported Files: 317
Sep-12-12 13:18:33 Finished in 8 second(s)
Sep-12-12 13:18:33 C:/assp/errors/notspam
Sep-12-12 13:18:33 File Count: 113
Sep-12-12 13:18:33 Processing... errors/notspam with 113 files
Sep-12-12 13:18:33 ignore and remove files older than Dec-17-09 12:18:33
in folder errors/notspam
Sep-12-12 13:18:40 26 attachment/image entries processed
Sep-12-12 13:18:40 Imported Files: 111
Sep-12-12 13:18:40 Finished in 7 second(s)
Sep-12-12 13:18:40 warning: missing information for automatic corpus
correction in file C:/assp/normfile - rerun the rebuild, if you see this
warning the first time!
Sep-12-12 13:18:40 C:/assp/spam
Sep-12-12 13:18:40 File Count: 4,363
Sep-12-12 13:18:40 Processing... spam with 4,363 files
Sep-12-12 13:19:27 remove
C:/assp/spam/Confirmation_of_changes_to_Boo--140013.eml WhiteList:
'ba.custs...@contact.britishairways.com'
Sep-12-12 13:19:27 remove
C:/assp/spam/Confirmation_of_changes_to_Boo--144011.eml WhiteList:
'ba.custs...@contact.britishairways.com'
Sep-12-12 13:19:27 remove
C:/assp/spam/Confirmation_of_changes_to_Boo--145936.eml WhiteList:
'ba.custs...@contact.britishairways.com'
Sep-12-12 13:19:27 remove
C:/assp/spam/Confirmation_of_changes_to_Boo--172792.eml WhiteList:
'ba.custs...@contact.britishairways.com'
Sep-12-12 13:20:07 remove
C:/assp/spam/FW_Time_Clarification_Walk_the--81794.eml WhiteList:
'busbysu...@hotmail.com'
Sep-12-12 13:22:50 Removed White: 5
Sep-12-12 13:22:50 481 attachment/image entries processed
Sep-12-12 13:22:50 Imported Files: 4,356
Sep-12-12 13:22:50 Finished in 250 second(s)
Sep-12-12 13:22:50 C:/assp/notspam
Sep-12-12 13:22:50 File Count: 12,640
Sep-12-12 13:22:50 Processing... notspam with 12,000 files
Sep-12-12 13:42:28 2,022 attachment/image entries processed
Sep-12-12 13:42:28 Imported Files: 12,001
Sep-12-12 13:42:28 Folder contents exceeded 'MaxFiles'(12000).
Sep-12-12 13:42:28 Finished in 1,178 second(s)
Sep-12-12 13:42:28 Rebuild processed 11.63 files per second.
Sep-12-12 13:42:28 Generating weighted Bayesian tuplets
Sep-12-12 13:42:38 start populating Spamdb with 175,796 records - Bayesian
check is now disabled!
Sep-12-12 13:43:45 Finished populating Spamdb with 175,796 records -
Bayesian check is now enabled!
Sep-12-12 13:43:45 done - Generating weighted Bayesian tuplets
Sep-12-12 13:43:45 Bayesian Pairs: 175,796 now in list
Sep-12-12 13:43:45 Generating consolidated Hidden-Markov-Model database
from 1,634,405 record model
Sep-12-12 13:45:16 HMM sequences: 800,876 now in list
Sep-12-12 13:45:16 generating Spamdb.helo records from 3,664 collected
HELO's
Sep-12-12 13:45:16 cleaning old Spamdb.helo records
Sep-12-12 13:45:17 done - cleaning old Spamdb.helo records
Sep-12-12 13:45:17 HELO Blacklist: 3 new, 94 now in list
Sep-12-12 13:45:17 Spam Weight: 1,598,969
Sep-12-12 13:45:17 Not-Spam Weight: 4,554,517
Sep-12-12 13:45:17 Corpus norm: 0.3511 - (warning: extremely ham
heavy)
Sep-12-12 13:45:17 Corpus confidence: 0.13526783
Sep-12-12 13:45:17 Recommendation: RebuildSpamDB will limit the number of
used messages in your corpus. Excess files will be ingored.
Sep-12-12 13:45:17 Corpus norm should be between 0.6 and 1.4
Sep-12-12 13:45:17 Recommendation: You need more spam messages in the
corpus.
Sep-12-12 13:45:17 starting auto correction for corpus - delete old ham
files from notspam
Sep-12-12 13:45:22 info: starting cleanup for to much (old) files in
folder C:/assp/notspam - will try to remove 40% of the files - will keep
at least 4000 files - will keep files younger than 14 days
info: deleted 1646 old files from folder C:/assp/notspam
Sep-12-12 13:45:22 Recommendation: You should reduce now MaxBytes to 2500!
Sep-12-12 13:45:27 Start populating Hidden Markov Model. HMM-check is
disabled for this time!
Sep-12-12 13:45:28 start populating Hidden Markov Model with 800,876
records!
Sep-12-12 13:49:06 Finished populating Hidden Markov Model with 800,876
records!
Sep-12-12 13:49:06 Finished populating Hidden Markov Model. HMM-check is
now enabled again!
Sep-12-12 13:49:06 Total processing time: 1,841 second(s)
Sep-12-12 13:49:06 Total processing data: 567.41 MByte
Sep-12-12 13:49:06 building new GripList records and bounce report
Sep-12-12 13:49:06 processing Logfile C:/assp/logs/maillog.txt
Sep-12-12 13:49:11 skipping bounce report because 'DoNotCollectBounces' is
switched ON
Sep-12-12 13:49:12 Uploading Griplist via Direct Connection
Sep-12-12 13:49:13 Submitted 2,910 bytes: 0 IPv6 addresses, 322 IPv4
addresses
Sep-12-12 13:49:13 Trashlist was saved to C:/assp/trashlist.db
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test
DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally
privileged and protected in law and are intended solely for the use of the
individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no
known virus in this email!
*******************************************************
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test