My system processes ~21.000 files per rebuild and this takes 2min 23 seconds, using the FileModel (~13min without). Generating both DB's and storing them in to MySQL takes ~11min..
Nov-10-20 04:00:12 RebuildSpamDB-thread rebuildspamdb-version 8.03 started in ASSP version 2.6.4(20310) Nov-10-20 04:00:12 Processing... errors/spam with 2,814 files Nov-10-20 04:00:25 Processing... errors/notspam with 1,126 files Nov-10-20 04:00:31 Processing... spam with 9,355 files Nov-10-20 04:01:24 Processing... notspam with 7,712 files Nov-10-20 04:02:23 Generating weighted Bayesian tuplets Nov-10-20 04:02:39 start populating Spamdb with 2,163,714 records! Nov-10-20 04:05:11 Bayesian Pairs: 2,163,714 now in list Nov-10-20 04:05:15 Generating consolidated Hidden-Markov-Model database from 9,786,151 record model Nov-10-20 04:06:15 HMM sequences: 4,789,916 now in list Nov-10-20 04:06:16 generating Spamdb.helo records from 8,629 collected HELO's Nov-10-20 04:06:17 start populating Hidden Markov Model with 4,789,916 records! Nov-10-20 04:13:14 Finished populating Hidden Markov Model. HMM-check is now enabled again! Nov-10-20 04:13:14 Total processing time: 782 second(s) Nov-10-20 04:13:14 Total processing data: 846.49 MByte Nov-10-20 04:13:14 Rebuild processed 160.34 files per second. Nov-10-20 04:13:17 Trashlist was saved to c:/assp/trashlist.db The system (on windows 2016) holds as much as possible in RAM, this takes 6.5 GB max and 3 GB avg. while running normaly - 1GB after startup. assp-process-memory: current: 3064 MB min: 960 MB max 3910 MB max is 6.534 MB while the rebuild runs > emails (over 20mb) ... Does that impact in-memory usage Not really. Since ever, 'MaxBytes' of the body of each file are processed and attachments are hashed. >I'm always worried about stability, there is no big difference in stability between windows and nix for assp on relative small installations. To keep every thing well, my assp restarts once in a week. >having the emails in memory Only the rebuild thread (10001) holds 'something' in memory (or BerkeleyDB) while the rebuild runs - but these are not files, this is the FileModel, which is something like a memory image for each file, after it was processed by the rebuild. Processing a file is: analyse the header, collect helos, decode MIME, convert to UTF-8, remove disclamer, hash attachments, collect words, unicode normalize words, stem words .... - which is very resource and time consuming. Instead of processing every file, the rebuild uses the FileModel content of the file (if available). Thomas Von: "K Post" <nntp.p...@gmail.com> An: "ASSP development mailing list" <assp-test@lists.sourceforge.net> Datum: 09.11.2020 20:40 Betreff: Re: [Assp-test] fixes in assp 2.6.4 *SPAM-Evaporator* build 20310 Thanks for the additional information. The in-memory bit keeps the entire corpus in memory, not just the current day's new mail! That makes more sense. So after a restart of the service, we should expect a slower (same speed as previous versions) rebuild right? Other than the speed of the rebuild (which takes about 45 minutes on my 30,000k installation fyi), is there any other benefit to the installation by having the emails in memory? We do get some pretty big emails (over 20mb) many times a day, and I store the whole thing so we can resend if necessary. Does that impact in-memory usage or does just the first x kb get stored - what's needed for the rebuild? My memory usage is usually around 1500mb, but it grows over time, sometimes to 3gb or more. Do you see this same growth? Mind you, I'm on Windows, not *nix. I'm always worried about stability, but it would be nice to have a super fast rebuild.... On Mon, Nov 9, 2020 at 11:17 AM Thomas Eckardt <thomas.ecka...@thockar.com > wrote: If the required information for a eml-file is not found in the FileModel, the file is processed normaly and the info for this file is added to the FileModel. Information for no longer existing files are removed from the FileModel. >2GB: this depends on the file count and MaxBytes (and some others) - it can be much more or much less. I expect 2GB for 30.000 files - a wild guess. My system requires ~ 1.2GB RAM for 20.000 files. >Does the new rebuildspam.pm improve rebuild time without enabling RebuildUsesFileModel? No. Thomas Von: "K Post" <nntp.p...@gmail.com> An: "ASSP development mailing list" < assp-test@lists.sourceforge.net> Datum: 09.11.2020 16:59 Betreff: Re: [Assp-test] fixes in assp 2.6.4 *SPAM-Evaporator* build 20310 Exciting changes. Thank you. On RebuildUsesFileModel, if ASSP crashes or the system is restarted, the RAM storage is obviously lost. Does the new process revert to the old method so that all messages from that day are considered during the rebuild? You gave 2gb as an example of additional memory for this new feature. What kind of email volume is that based on? Does the new rebuildspam.pm improve rebuild time without enabling RebuildUsesFileModel? Thanks again Ken On Thu, Nov 5, 2020 at 4:48 AM Thomas Eckardt <thomas.ecka...@thockar.com> wrote: H all, fixed in assp 2.6.4 *SPAM-Evaporator* build 20310: - trailing digits in the hostname (like 'mx.microsoft.com 1') in ARC-header lines were leading in to a 'notmatch' for trusted forwarder definitions changed: - The rebuildspamdb.pm module is upgraded to version 8.03. It provides faster rebuild processing, and much shorter locking times for HMMdb and SpamDB. - performance improvement for the import/export database feature - if email addresses and IP-addresses are managed using the GUI, a given reason and the date are written to the comment of the modified line - improved MIME-header fixup for missing boundary definitions - improved database cache handling added: 'RebuildUsesFileModel','Build a Model from all processed emails for faster processing' The rebuild task builds a content model (in memory or BerkelyDB only) of all processed files, and uses this model at the next rebuild for faster processing. The time to process the mail-files is reduced down to a tenth (if BerkeleyDB is not used ( useDB4Rebuild OFF )), but requires a large amount of additional memory - eg. 2GB. The time to process the mail-files is reduced to a half, if BerkeleyDB is used ( useDB4Rebuild ON ). The default setting is ON The first rebuild after setting this to ON will run at a normal speed - all the next rebuild tasks will run faster. Thomas DISCLAIMER: ******************************************************* This email and any files transmitted with it may be confidential, legally privileged and protected in law and are intended solely for the use of the individual to whom it is addressed. This email was multiple times scanned for viruses. There should be no known virus in this email! ******************************************************* _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test DISCLAIMER: ******************************************************* This email and any files transmitted with it may be confidential, legally privileged and protected in law and are intended solely for the use of the individual to whom it is addressed. This email was multiple times scanned for viruses. There should be no known virus in this email! ******************************************************* _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test DISCLAIMER: ******************************************************* This email and any files transmitted with it may be confidential, legally privileged and protected in law and are intended solely for the use of the individual to whom it is addressed. This email was multiple times scanned for viruses. There should be no known virus in this email! *******************************************************
_______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test