There is something too slow on your system. 

Nov-21-11 04:03:48 c:/assp/spam
Nov-21-11 04:03:49 File Count:  1,742
Nov-21-11 04:03:49 Processing... spam with 1742 files
Nov-21-11 04:07:22 Removed Old: 6
Nov-21-11 04:07:22 Imported Files:      1,736
Nov-21-11 04:07:22 Finished in 214 second(s)
 
even 'hours now instead of 35-45' is too slow.
Without HMM 10 - 20 files should be processed per second.

There are multiple possible reasons for such a slowness. I think, slow IO 
performance is the reason in your case. BerkeleyDB (use for the temporary 
HMM DB's) needs a very fast IO system, in case large DB's like HMM are 
processed.

>Nov-21-11 10:19:07 Finished in 3456 second(s)

In this time ASSP (rebuild task) reads files - parses them for words and 
fills 3 BerkeleyDB databases - nothing else.
Every write (made by ASSP) to any of these 3 DB's requires multiple reads 
and writes.

Why should the IO system be the reason?

- major difference between - with and without HMM are the requested IO's 
per second
- the rebuild task uses only one core and needs ~ 100MB RAM - but all 
processes are slowing down 

It is not important that the IO system could process many MB/s - IO/s is 
much more important.

My IO system (in an ESX server) for assp is designed as follows.

- LSI high performance SAS-CTL with 256 MB Cache
- 6 disks each with 10Krpm in a RAID10

at least ASSP has to share the IO-engine on my system with three other IO 
requesting major applications: Lotus Domino 8.5.2, MySQL, MS-SQL

Your system has 8GB RAM. You can try to increase the cache size used for 
the BerkeleyDB -> defined in a file DB_CONFIG . This file must be created 
in every tmpdb/HMM* folder and the tmpDB/rebuildDB folder. 
http://www.mathematik.uni-ulm.de/help/BerkeleyDB/ref/env/db_config.html
http://pybsddb.sourceforge.net/ref/am_conf/cachesize.html


Thomas



Von:    Michael <[email protected]>
An:     ASSP development mailing list <[email protected]>
Datum:  21.11.2011 11:37
Betreff:        [Assp-test] best config for rebuildspam and HMM




First: HMM works perfect!

But the rebuildspam Thread takes up to 12 hours now instead of 35-45 
minutes without HMM and slows down the complete server that other 
services are badly accessible.
E.g.:
Nov-21-11 09:21:31 /usr/share/assp/errors/spam
Nov-21-11 09:21:31 File Count:           4,448
Nov-21-11 09:21:31 Processing... errors/spam with 4448 files
Nov-21-11 10:19:07 Imported Files:               4,448
Nov-21-11 10:19:07 Finished in 3456 second(s)

So I am not sure to have the best setup for that, so any tips for 
improving or finding the bottleneck are welcome!

I am running ASSP with Databases and a mysql-Server (max Database 
connections is set to 5000) on an Ubuntu 10.04,Intel(R) Core(TM) i7 
CPU 920 @ 2.67GHz, 8 cores server with 8 GB RAM.

ASSP Config is set with:
- HMMusesBDB: off,
- useDB4Rebuild: on
- useBerkeleyDB: on
- spamdb: DB:
- DBdriver: mysql
- (all) ThreadCycleTime: 0
- useDB4IntCache: off

Thanks for your support!

Michael




------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test




DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally 
privileged and protected in law and are intended solely for the use of the 

individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no 
known virus in this email!
*******************************************************


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to