possibly the spamdb table is anyway damaged ?
spamdb is populated with 5385 records per second
hmmdb is populated with 2962 records per second
the speed should be similar for each of the tables !?
- look in to the maillog.txt, if there is a warning or error for the
spamdb populating
to solves this:
- stop assp
- remove the spamdb table
- start assp
- rerun a rebuild or restore(import) the spamdb froma backup
Thomas
Von: Masood Rahim <[email protected]>
An: [email protected],
Datum: 17.10.2013 08:30
Betreff: [Assp-test] spamdb/hmmdb
Hi,
I'm a little confused with the generation of the spamdb and hmmdb in assp.
Both are using a mysql database with the corresponding tables. It seems
that right now the hmmdb is filled with all the records that are supposed
to be there but the spamdb has one record that says pkey: ***bayesnorm***,
pvalue: 100, pfrozen: 0
Why does the hmmdb build correctly but spamdb does not? Below is the
output from the most recent rebuildrun. autoCorrectCorpus is set to:
0.6-1.4-4000-14
Thanks,
Masood
Oct-16-13 21:53:01 RebuildSpamDB-thread rebuildspamdb-version 6.03 started
in ASSP version 2.2.1(12265)
Oct-16-13 21:53:01 RebuildSpamDB will create a Hidden Markov Model!
Oct-16-13 21:53:01 RebuildSpamDB will create unicode enabled databases.
Oct-16-13 21:53:01 RebuildSpamDB process all words as Sequence of UAX #29
Grapheme Clusters.
Oct-16-13 21:53:01 ---ASSP Settings---
Oct-16-13 21:53:01 Do Not Collect Messages with RedListed address: Enabled
**Messages with RedListed addresses will be removed from the corpus!**
Oct-16-13 21:53:01 Do Not Collect RedRe Messages: Enabled
**Messages matching the RedRe will be removed from the corpus!**
Oct-16-13 21:53:01 Use Subject as Maillog Names: True
Oct-16-13 21:53:01 Maxbytes: 6000
Oct-16-13 21:53:01 RebuildFileTimeLimit: 1 5
Oct-16-13 21:53:01 RebuildFileTimeLimit: files will be moved away from the
corpus, if there processing takes longer than 5 second(s)
Oct-16-13 21:53:01 /usr/local/assp/errors/spam
Oct-16-13 21:53:01 File Count: 2
Oct-16-13 21:53:01 Processing... errors/spam with 2 files
Oct-16-13 21:53:01 ignore and remove files older than Jan-20-11 20:53:01
in folder errors/spam
Oct-16-13 21:53:01 Imported Files: 0
Oct-16-13 21:53:01 Finished in 1 second(s)
Oct-16-13 21:53:01 /usr/local/assp/errors/notspam
Oct-16-13 21:53:01 File Count: 56
Oct-16-13 21:53:01 Processing... errors/notspam with 56 files
Oct-16-13 21:53:01 ignore and remove files older than Jan-20-11 20:53:01
in folder errors/notspam
Oct-16-13 21:53:03 Imported Files: 54
Oct-16-13 21:53:03 Finished in 2 second(s)
Oct-16-13 21:53:03 info: corpusnorm after processing errors/spam and
errors/notspam is Spam Weight: 0 / Not-Spam Weight: 41560 => norm: 0.0001
Oct-16-13 21:53:03 info: require 1905 files from folder spam to get the
wanted corpusnorm (1)
Oct-16-13 21:53:03 /usr/local/assp/spam
Oct-16-13 21:53:03 File Count: 2,567
Oct-16-13 21:53:03 Processing... spam with 1,905 files
Oct-16-13 21:53:03 ignore and remove files older than Sep-15-13 21:53:03
in folder spam
Oct-16-13 21:57:04 Imported Files: 1,906
Oct-16-13 21:57:04 Finished in 241 second(s)
Oct-16-13 21:57:04 info: require all files from folder notspam to get the
wanted corpusnorm (1)
Oct-16-13 21:57:04 /usr/local/assp/notspam
Oct-16-13 21:57:04 File Count: 2,054
Oct-16-13 21:57:04 Processing... notspam with 2,054 files
Oct-16-13 21:57:04 ignore and remove files older than Sep-15-13 21:57:04
in folder notspam
Oct-16-13 22:01:01 Imported Files: 2,052
Oct-16-13 22:01:01 Finished in 237 second(s)
Oct-16-13 22:01:01 Rebuild processed 8.25 files per second. Good values
are 12 files per second and higher. You can speed up the rebuild process,
using a cached (>=128MB) IO-controller or a RAM-disk with at least 250.00
MByte for the folder '/usr/local/assp/tmpDB'.
Oct-16-13 22:01:01 Generating weighted Bayesian tuplets
Oct-16-13 22:01:08 start populating Spamdb with 64,620 records - Bayesian
check is now disabled!
Oct-16-13 22:01:20 Finished populating Spamdb with 64,620 records -
Bayesian check is now enabled!
Oct-16-13 22:01:20 done - Generating weighted Bayesian tuplets
Oct-16-13 22:01:20 Bayesian Pairs: 64,620 now in list
Oct-16-13 22:01:20 Generating consolidated Hidden-Markov-Model database
from 1,039,753 record model
Oct-16-13 22:01:59 HMM sequences: 491,755 now in list
Oct-16-13 22:01:59 generating Spamdb.helo records from 0 collected HELO's
Oct-16-13 22:01:59 cleaning old Spamdb.helo records
Oct-16-13 22:01:59 done - cleaning old Spamdb.helo records
Oct-16-13 22:01:59 HELO Blacklist: 0 new, 0 now in list
Oct-16-13 22:01:59 Spam Weight: 1,189,312
Oct-16-13 22:01:59 Not-Spam Weight: 976,242
Oct-16-13 22:01:59 Corpus norm: 1.2183 - (ok - slighly spam
heavy)
Oct-16-13 22:01:59 Corpus confidence: 0.67378816
Oct-16-13 22:02:04 Start populating Hidden Markov Model. HMM-check is
disabled for this time!
Oct-16-13 22:02:04 start populating Hidden Markov Model with 491,755
records!
Oct-16-13 23:04:50 Finished populating Hidden Markov Model with 491,755
records!
Oct-16-13 23:04:50 Finished populating Hidden Markov Model. HMM-check is
now enabled again!
Oct-16-13 23:04:50 Total processing time: 4,309 second(s)
Oct-16-13 23:04:50 Total processing data: 32.33 MByte
Oct-16-13 23:04:50 building new GripList records and bounce report
Oct-16-13 23:04:50 processing Logfile /usr/local/assp/logs/maillog.txt
Oct-16-13 23:05:19 processing Logfile
/usr/local/assp/logs/13-10-15.maillog.txt
Oct-16-13 23:06:06 processing Logfile
/usr/local/assp/logs/13-10-14.maillog.txt
Oct-16-13 23:06:20 processing Logfile
/usr/local/assp/logs/13-10-13.maillog.txt
Oct-16-13 23:06:53 processing Logfile
/usr/local/assp/logs/13-10-12.maillog.txt
Oct-16-13 23:07:06 skipping bounce report because 'DoNotCollectBounces' is
switched ON
Oct-16-13 23:07:07 Uploading Griplist via Direct Connection
Oct-16-13 23:07:08 Submitted 21,837 bytes: 0 IPv6 addresses, 2,425 IPv4
addresses
Oct-16-13 23:07:08 Trashlist was saved to /usr/local/assp/trashlist.db
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test
DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally
privileged and protected in law and are intended solely for the use of the
individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no
known virus in this email!
*******************************************************
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test