>For the first two to three thousand files the rebuild goes through 
really quickly. Somewhere after that it starts to slow down 

First: what is your setting for 'useDB4rebuild' and is BerkeleyDB 
installed ?
If  'useDB4rebuild' is set but BerkeleyDB is not installed, DB_File will 
be used, which is getting slower and slower if the DB is filled up - and 
is ~ 10 times slower than BerkeleyDB.
If  'useDB4rebuild' is set but BerkeleyDB nor DB_File is installed the 
internal 'orderedtie' is used, which is 100 times and more slower that 
BerkeleyDB.
If  'useDB4rebuild' is not set, the rebuild will use a large amount of 
memory - but will run very fast.

>Once it completes the 
spam folder it moves to the notspam folder and starts off quicker again.

This behavior is mostly normal and depends on the contents of the corpus 
files.

These are some values from my slow prod system.

Apr-10-12 04:00:01 c:/assp/errors/spam
Apr-10-12 04:00:01 File Count:           794
Apr-10-12 04:02:28 Finished in 147 second(s)
~5.4 files/s

Apr-10-12 04:02:28 c:/assp/errors/notspam
Apr-10-12 04:02:28 File Count:           480
Apr-10-12 04:03:55 Finished in 87 second(s)
~5.5 files/s

Apr-10-12 04:03:55 c:/assp/spam
Apr-10-12 04:03:55 File Count:           1,839
Apr-10-12 04:07:11 Finished in 196 second(s)
~9.4 files/s

Apr-10-12 04:07:11 c:/assp/notspam
Apr-10-12 04:07:11 File Count:           679
Apr-10-12 04:08:48 Finished in 97 second(s)
~7.0 files/s

Apr-10-12 04:08:48 Rebuild processed 7.13 files per second. Good values 
are 10 files per second and higher.

You can see, the files in the error folder are processed slower - because 
the size of the analyzed data is 2 times the size in the other both 
folders.

You may also try to reduce 'MaxFiles'. It should be at least at a value of 
the largest error folder. In most case, it makes not a big difference to 
the confidence of the resulting databases,
if the rebuild processes 7000 or 14000 files (this may or may not be true 
!)

>sit around 1.9GB usage
>According to other discussions I have 
>seen here that may be a bit on the high side as they aren't particularly 
>high volume so I'm wondering if there's anything I can do to improve 
that.

If you switched all the main hashes and lists to MySQL and all the 
'useDB4....' config parms are set - all is done to reduce the memory usage 
to a minimum.

I'm running exactly this config with all plugins fully in use and 5 
workers - the system uses never more than 500 MB of memory

Thomas



Von:    Colin <a...@lanternhosting.co.uk>
An:     ASSP development mailing list <assp-test@lists.sourceforge.net>
Datum:  09.04.2012 19:54
Betreff:        Re: [Assp-test] Antwort:  Multiple instances and 
rebuildspamdb



Thanks for the comments,

I have been running the config sync ever since I started using version 2 
I just haven't had the spare resources for a separate mysql server.

Seeing the recent comments about memory usage I have been able to drop 
to 32bit vms with half as much memory. I was previously running two 
64bit vms with 4GB RAM that sat around 3.6GB usage. I now run two 32bit 
vmws that sit around 1.9GB usage. According to other discussions I have 
seen here that may be a bit on the high side as they aren't particularly 
high volume so I'm wondering if there's anything I can do to improve that.

I am however continuing to experience slowness with rebuildspamdb. When 
there is a relatively small corpus (a few hundred files), it chugs 
through nice and quickly at an average of 0.05 seconds per message. By 
watching the status page I can see issues when there is a larger corpus. 
For the first two to three thousand files the rebuild goes through 
really quickly. Somewhere after that it starts to slow down and gets 
slower the more files there are. I have the corpus set to 14000 and by 
the time it gets to there the processing time can get up to 20 seconds 
per message.

This pattern is for each corpus folder rather than rebuildspamdb as a 
whole. By that I mean that the first few thousand messages in the spam 
folder are processed quickly then it gets slower. Once it completes the 
spam folder it moves to the notspam folder and starts off quicker again.

CPU usage goes up and down with each message that is being processed but 
doesn't seem to max out. Same with disk usage. The run that has just 
finished has ASSP going up to 35% memory usage of the VM. It took 3 
hours, 56 minutes and 5 seconds to complete.

MaxFiles: 14000
MaxBytes: 4000

Any suggestions to help speed it up?

All the best,
Colin.


On 09/04/2012 09:57, Thomas Eckardt wrote:
> Hi Colin,
>
>> 1) Are there any recommended configuration optimisations for mysql that
> people have found to help with ASSP?
>
> One assp should run as a DB-master, the other(s) as DB-slave
> ('mysqlSlaveMode').
> DB-master should be the one with the lower workload, because DB-slaves 
are
> doing no DB-maintenance. The rebuild should also run on the master, to
> prevent access violations.
>
>> Would I be right in saying that as I now have mysql I only
> need to run rebuildspamdb on one of the ASSP instances?
>
> Yes, that's right. How ever there is a small issue with the shared 
corpus.
> All file names have the trailing '--counter'. To prevent that one assp
> overwrites a corpus file stored by any other, the counter should be set
> diffferntly on each assp. There is currently no GUI topic to do this. I
> would set one assp to a high starting 'counter' eg.: 1.000.000 - to do
> this:
>
> - stop this assp
> - delete the folder assp/tmpDB/Stats (if there is one)
> - open the file 'asspstats.sav' in an editor and change the number 
behind
> the tag 'counter' to (eg) 1000000 - be carefull, change only the number
> (no CR LF dot comma or anything else)
> - start assp
>
> Now both assp will generate different file names in the corpus, even if
> both are getting mails with the same subject.
>
> To make it more easy to manage both assp configurations (settings, regex
> files ....), I recommend to configure the config synch feature (read the
> GUI !!).
>
> Thomas
>
>
>
>
>
>
> Von:    Colin<a...@lanternhosting.co.uk>
> An:     ASSP development mailing list<assp-test@lists.sourceforge.net>
> Datum:  08.04.2012 19:45
> Betreff:        [Assp-test] Multiple instances and rebuildspamdb
>
>
>
> Hi folks,
>
> Further to my recent emails, I scrapped previous attempts and installed
> the new Ubuntu 12.04 as 32bit with 32bit perl 5.14.
>
> So, now I have two front end servers running ASSP and Exim and I have
> finally managed to get round to building a separate 64bit mysql server.
> Both ASSP instances connect successfully to mysql so I have a couple of
> simple questions.
>
> 1) Are there any recommended configuration optimisations for mysql that
> people have found to help with ASSP?
> 2) Both my ASSP instances save to a shared corpus using subject as
> filenames. Would I be right in saying that as I now have mysql I only
> need to run rebuildspamdb on one of the ASSP instances?
>
> Thanks,
> Colin.
>
> 
------------------------------------------------------------------------------
> For Developers, A Lot Can Happen In A Second.
> Boundary is the first to Know...and Tell You.
> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
> http://p.sf.net/sfu/Boundary-d2dvs2
> _______________________________________________
> Assp-test mailing list
> Assp-test@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/assp-test
>
>
>
>
> DISCLAIMER:
> *******************************************************
> This email and any files transmitted with it may be confidential, 
legally
> privileged and protected in law and are intended solely for the use of 
the
>
> individual to whom it is addressed.
> This email was multiple times scanned for viruses. There should be no
> known virus in this email!
> *******************************************************
>
>
>
>
> 
------------------------------------------------------------------------------
> For Developers, A Lot Can Happen In A Second.
> Boundary is the first to Know...and Tell You.
> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
> http://p.sf.net/sfu/Boundary-d2dvs2
>
>
> _______________________________________________
> Assp-test mailing list
> Assp-test@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/assp-test



------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test




DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally 
privileged and protected in law and are intended solely for the use of the 

individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no 
known virus in this email!
*******************************************************


------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to