Hi,

I'm a little confused with the generation of the spamdb and hmmdb in assp.  
Both are using a mysql database with the corresponding tables.  It seems that 
right now the hmmdb is filled with all the records that are supposed to be 
there but the spamdb has one record that says pkey: ***bayesnorm***, pvalue: 
100, pfrozen: 0

Why does the hmmdb build correctly but spamdb does not?  Below is the output 
from the most recent rebuildrun.  autoCorrectCorpus is set to: 0.6-1.4-4000-14

Thanks,

Masood



Oct-16-13 21:53:01 RebuildSpamDB-thread rebuildspamdb-version 6.03 started in 
ASSP version 2.2.1(12265)

Oct-16-13 21:53:01 RebuildSpamDB will create a Hidden Markov Model!

Oct-16-13 21:53:01 RebuildSpamDB will create unicode enabled databases.

Oct-16-13 21:53:01 RebuildSpamDB process all words as Sequence of UAX #29 
Grapheme Clusters.

Oct-16-13 21:53:01 ---ASSP Settings---
Oct-16-13 21:53:01 Do Not Collect Messages with RedListed address: Enabled
**Messages with RedListed addresses will be removed from the corpus!**

Oct-16-13 21:53:01 Do Not Collect RedRe Messages: Enabled
**Messages matching the RedRe will be removed from the corpus!**

Oct-16-13 21:53:01 Use Subject as Maillog Names: True
Oct-16-13 21:53:01 Maxbytes: 6000 
Oct-16-13 21:53:01 RebuildFileTimeLimit: 1 5 
Oct-16-13 21:53:01 RebuildFileTimeLimit: files will be moved away from the 
corpus, if there processing takes longer than 5 second(s) 

Oct-16-13 21:53:01 /usr/local/assp/errors/spam
Oct-16-13 21:53:01 File Count:  2
Oct-16-13 21:53:01 Processing... errors/spam with 2 files
Oct-16-13 21:53:01 ignore and remove files older than Jan-20-11 20:53:01 in 
folder errors/spam
Oct-16-13 21:53:01 Imported Files:      0
Oct-16-13 21:53:01 Finished in 1 second(s)

Oct-16-13 21:53:01 /usr/local/assp/errors/notspam
Oct-16-13 21:53:01 File Count:  56
Oct-16-13 21:53:01 Processing... errors/notspam with 56 files
Oct-16-13 21:53:01 ignore and remove files older than Jan-20-11 20:53:01 in 
folder errors/notspam
Oct-16-13 21:53:03 Imported Files:      54
Oct-16-13 21:53:03 Finished in 2 second(s)
Oct-16-13 21:53:03 info: corpusnorm after processing errors/spam and 
errors/notspam is Spam Weight: 0 / Not-Spam Weight: 41560 => norm: 0.0001 
Oct-16-13 21:53:03 info: require 1905 files from folder spam to get the wanted 
corpusnorm (1)

Oct-16-13 21:53:03 /usr/local/assp/spam
Oct-16-13 21:53:03 File Count:  2,567
Oct-16-13 21:53:03 Processing... spam with 1,905 files
Oct-16-13 21:53:03 ignore and remove files older than Sep-15-13 21:53:03 in 
folder spam
Oct-16-13 21:57:04 Imported Files:      1,906
Oct-16-13 21:57:04 Finished in 241 second(s)
Oct-16-13 21:57:04 info: require all files from folder notspam to get the 
wanted corpusnorm (1)

Oct-16-13 21:57:04 /usr/local/assp/notspam
Oct-16-13 21:57:04 File Count:  2,054
Oct-16-13 21:57:04 Processing... notspam with 2,054 files
Oct-16-13 21:57:04 ignore and remove files older than Sep-15-13 21:57:04 in 
folder notspam
Oct-16-13 22:01:01 Imported Files:      2,052
Oct-16-13 22:01:01 Finished in 237 second(s)

Oct-16-13 22:01:01 Rebuild processed 8.25 files per second. Good values are 12 
files per second and higher. You can speed up the rebuild process, using a 
cached (>=128MB) IO-controller or a RAM-disk with at least 250.00 MByte for the 
folder '/usr/local/assp/tmpDB'.

Oct-16-13 22:01:01 Generating weighted Bayesian tuplets
Oct-16-13 22:01:08 start populating Spamdb with 64,620 records - Bayesian check 
is now disabled!
Oct-16-13 22:01:20 Finished populating Spamdb with 64,620 records - Bayesian 
check is now enabled!
Oct-16-13 22:01:20 done - Generating weighted Bayesian tuplets

Oct-16-13 22:01:20 Bayesian Pairs: 64,620 now in list

Oct-16-13 22:01:20 Generating consolidated Hidden-Markov-Model database from 
1,039,753 record model
Oct-16-13 22:01:59 HMM sequences: 491,755 now in list

Oct-16-13 22:01:59 generating Spamdb.helo records from 0 collected HELO's
Oct-16-13 22:01:59 cleaning old Spamdb.helo records
Oct-16-13 22:01:59 done - cleaning old Spamdb.helo records

Oct-16-13 22:01:59 HELO Blacklist: 0 new, 0 now in list

Oct-16-13 22:01:59 Spam Weight:    1,189,312
Oct-16-13 22:01:59 Not-Spam Weight:   976,242

Oct-16-13 22:01:59 Corpus norm: 1.2183 - (ok - slighly spam heavy)
Oct-16-13 22:01:59 Corpus confidence:   0.67378816

Oct-16-13 22:02:04 Start populating Hidden Markov Model. HMM-check is disabled 
for this time!
Oct-16-13 22:02:04 start populating Hidden Markov Model with 491,755 records!
Oct-16-13 23:04:50 Finished populating Hidden Markov Model with 491,755 records!
Oct-16-13 23:04:50 Finished populating Hidden Markov Model. HMM-check is now 
enabled again!

Oct-16-13 23:04:50 Total processing time: 4,309 second(s)

Oct-16-13 23:04:50 Total processing data: 32.33 MByte

Oct-16-13 23:04:50 building new GripList records and bounce report
Oct-16-13 23:04:50 processing Logfile /usr/local/assp/logs/maillog.txt
Oct-16-13 23:05:19 processing Logfile /usr/local/assp/logs/13-10-15.maillog.txt
Oct-16-13 23:06:06 processing Logfile /usr/local/assp/logs/13-10-14.maillog.txt
Oct-16-13 23:06:20 processing Logfile /usr/local/assp/logs/13-10-13.maillog.txt
Oct-16-13 23:06:53 processing Logfile /usr/local/assp/logs/13-10-12.maillog.txt

Oct-16-13 23:07:06 skipping bounce report because 'DoNotCollectBounces' is 
switched ON

Oct-16-13 23:07:07 Uploading Griplist via Direct Connection
Oct-16-13 23:07:08 Submitted 21,837 bytes: 0 IPv6 addresses, 2,425 IPv4 
addresses

Oct-16-13 23:07:08 Trashlist was saved to /usr/local/assp/trashlist.db

           

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to