My system processes ~21.000 files per rebuild and this takes 2min 23 
seconds, using the FileModel (~13min without).
Generating both DB's and storing them in to MySQL takes ~11min..

Nov-10-20 04:00:12 RebuildSpamDB-thread rebuildspamdb-version 8.03 started 
in ASSP version 2.6.4(20310)
Nov-10-20 04:00:12 Processing... errors/spam with 2,814 files
Nov-10-20 04:00:25 Processing... errors/notspam with 1,126 files
Nov-10-20 04:00:31 Processing... spam with 9,355 files
Nov-10-20 04:01:24 Processing... notspam with 7,712 files
Nov-10-20 04:02:23 Generating weighted Bayesian tuplets
Nov-10-20 04:02:39 start populating Spamdb with 2,163,714 records!
Nov-10-20 04:05:11 Bayesian Pairs: 2,163,714 now in list
Nov-10-20 04:05:15 Generating consolidated Hidden-Markov-Model database 
from 9,786,151 record model
Nov-10-20 04:06:15 HMM sequences: 4,789,916 now in list
Nov-10-20 04:06:16 generating Spamdb.helo records from 8,629 collected 
HELO's
Nov-10-20 04:06:17 start populating Hidden Markov Model with 4,789,916 
records!
Nov-10-20 04:13:14 Finished populating Hidden Markov Model. HMM-check is 
now enabled again!
Nov-10-20 04:13:14 Total processing time: 782 second(s)
Nov-10-20 04:13:14 Total processing data: 846.49 MByte
Nov-10-20 04:13:14 Rebuild processed 160.34 files per second.
Nov-10-20 04:13:17 Trashlist was saved to c:/assp/trashlist.db

The system (on windows 2016) holds as much as possible in RAM, this takes 
6.5 GB max and 3 GB avg. while running normaly - 1GB after startup.

assp-process-memory:    current: 3064 MB        min: 960 MB     max 3910 
MB
max is 6.534 MB while the rebuild runs
 
> emails (over 20mb)  ... Does that impact in-memory usage

Not really.
Since ever, 'MaxBytes' of the body of each file are processed and 
attachments are hashed.

>I'm always worried about stability,

there is no big difference in stability between windows and nix for assp 
on relative small installations. To keep every thing well, my assp 
restarts once in a week.

>having the emails in memory

Only the rebuild thread (10001) holds 'something' in memory (or 
BerkeleyDB) while the rebuild runs - but these are not files, this is the 
FileModel, which is something like a memory image for each file, after it 
was processed by the rebuild.
Processing a file is: analyse the header, collect helos, decode MIME, 
convert to UTF-8, remove disclamer, hash attachments, collect words, 
unicode normalize words, stem words .... - which is very resource and time 
consuming.
Instead of processing every file, the rebuild uses the FileModel content 
of the file (if available).

Thomas






Von:    "K Post" <nntp.p...@gmail.com>
An:     "ASSP development mailing list" <assp-test@lists.sourceforge.net>
Datum:  09.11.2020 20:40
Betreff:        Re: [Assp-test] fixes in assp 2.6.4 *SPAM-Evaporator* 
build 20310



Thanks for the additional information.  The in-memory bit keeps the entire 
corpus in memory, not just the current day's new mail!  That makes more 
sense.  So after a restart of the service, we should expect a slower (same 
speed as previous versions) rebuild right?

Other than the speed of the rebuild (which takes about 45 minutes on my 
30,000k installation fyi), is there any other benefit to the installation 
by having the emails in memory?  

We do get some pretty big emails (over 20mb) many times a day, and I store 
the whole thing so we can resend if necessary.  Does that impact in-memory 
usage or does just the first x kb get stored - what's needed for the 
rebuild?

My memory usage is usually around 1500mb, but it grows over time, 
sometimes to 3gb or more.  Do you see this same growth?  Mind you, I'm on 
Windows, not *nix.  I'm always worried about stability, but it would be 
nice to have a super fast rebuild....

On Mon, Nov 9, 2020 at 11:17 AM Thomas Eckardt <thomas.ecka...@thockar.com
> wrote:
If the required information for a eml-file is not found in the FileModel, 
the file is processed normaly and the info for this file is added to the 
FileModel. 
Information for no longer existing files are removed from the FileModel. 

>2GB: 
this depends on the file count and MaxBytes (and some others) - it can be 
much more or much less. I expect 2GB for 30.000 files - a wild guess. 
My system requires ~ 1.2GB RAM for 20.000 files. 

>Does the new rebuildspam.pm improve rebuild time without enabling 
RebuildUsesFileModel? 

No. 

Thomas 






Von:        "K Post" <nntp.p...@gmail.com> 
An:        "ASSP development mailing list" <
assp-test@lists.sourceforge.net> 
Datum:        09.11.2020 16:59 
Betreff:        Re: [Assp-test] fixes in assp 2.6.4 *SPAM-Evaporator* 
build 20310 



Exciting changes. Thank you. 
On RebuildUsesFileModel, if ASSP crashes or the system is restarted, the 
RAM storage is obviously lost.  Does the new process revert to the old 
method so that all messages from that day are considered during the 
rebuild? 
You gave 2gb as an example of additional memory for this new feature.  
What kind of email volume is that based on?   
Does the new rebuildspam.pm improve rebuild time without enabling 
RebuildUsesFileModel? 

Thanks again 
Ken 
  
   

On Thu, Nov 5, 2020 at 4:48 AM Thomas Eckardt <thomas.ecka...@thockar.com> 
wrote: 
H all, 

fixed in assp 2.6.4 *SPAM-Evaporator* build 20310: 

- trailing digits in the hostname (like 'mx.microsoft.com 1') in 
ARC-header lines were leading in to a 'notmatch' for trusted forwarder 
definitions 



changed: 

- The rebuildspamdb.pm module is upgraded to version 8.03. It provides 
faster rebuild processing, and much shorter locking times for HMMdb and 
SpamDB. 

- performance improvement for the import/export database feature 

- if email addresses and IP-addresses are managed using the GUI, a given 
reason and the date are written to the comment of the modified line 

- improved MIME-header fixup for missing boundary definitions 

- improved database cache handling 


added: 

'RebuildUsesFileModel','Build a Model from all processed emails for faster 
processing' 

 The rebuild task builds a content model (in memory or BerkelyDB only) of 
all processed files, and uses this model at the next rebuild for faster 
processing. 
 The time to process the mail-files is reduced down to a tenth (if 
BerkeleyDB is not used ( useDB4Rebuild OFF )), but requires a large amount 
of additional memory - eg. 2GB. 
 The time to process the mail-files is reduced to a half, if BerkeleyDB is 
used ( useDB4Rebuild ON ). 
 The default setting is ON 
 The first rebuild after setting this to ON will run at a normal speed - 
all the next rebuild tasks will run faster. 



Thomas

DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally 
privileged and protected in law and are intended solely for the use of the 

individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no 
known virus in this email!
*******************************************************

_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test




DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally 
privileged and protected in law and are intended solely for the use of the 

individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no 
known virus in this email!
*******************************************************

_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test




DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally 
privileged and protected in law and are intended solely for the use of the 

individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no 
known virus in this email!
*******************************************************


_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to