There is something too slow on your system.
Nov-21-11 04:03:48 c:/assp/spam
Nov-21-11 04:03:49 File Count: 1,742
Nov-21-11 04:03:49 Processing... spam with 1742 files
Nov-21-11 04:07:22 Removed Old: 6
Nov-21-11 04:07:22 Imported Files: 1,736
Nov-21-11 04:07:22 Finished in 214 second(s)
even 'hours now instead of 35-45' is too slow.
Without HMM 10 - 20 files should be processed per second.
There are multiple possible reasons for such a slowness. I think, slow IO
performance is the reason in your case. BerkeleyDB (use for the temporary
HMM DB's) needs a very fast IO system, in case large DB's like HMM are
processed.
>Nov-21-11 10:19:07 Finished in 3456 second(s)
In this time ASSP (rebuild task) reads files - parses them for words and
fills 3 BerkeleyDB databases - nothing else.
Every write (made by ASSP) to any of these 3 DB's requires multiple reads
and writes.
Why should the IO system be the reason?
- major difference between - with and without HMM are the requested IO's
per second
- the rebuild task uses only one core and needs ~ 100MB RAM - but all
processes are slowing down
It is not important that the IO system could process many MB/s - IO/s is
much more important.
My IO system (in an ESX server) for assp is designed as follows.
- LSI high performance SAS-CTL with 256 MB Cache
- 6 disks each with 10Krpm in a RAID10
at least ASSP has to share the IO-engine on my system with three other IO
requesting major applications: Lotus Domino 8.5.2, MySQL, MS-SQL
Your system has 8GB RAM. You can try to increase the cache size used for
the BerkeleyDB -> defined in a file DB_CONFIG . This file must be created
in every tmpdb/HMM* folder and the tmpDB/rebuildDB folder.
http://www.mathematik.uni-ulm.de/help/BerkeleyDB/ref/env/db_config.html
http://pybsddb.sourceforge.net/ref/am_conf/cachesize.html
Thomas
Von: Michael <[email protected]>
An: ASSP development mailing list <[email protected]>
Datum: 21.11.2011 11:37
Betreff: [Assp-test] best config for rebuildspam and HMM
First: HMM works perfect!
But the rebuildspam Thread takes up to 12 hours now instead of 35-45
minutes without HMM and slows down the complete server that other
services are badly accessible.
E.g.:
Nov-21-11 09:21:31 /usr/share/assp/errors/spam
Nov-21-11 09:21:31 File Count: 4,448
Nov-21-11 09:21:31 Processing... errors/spam with 4448 files
Nov-21-11 10:19:07 Imported Files: 4,448
Nov-21-11 10:19:07 Finished in 3456 second(s)
So I am not sure to have the best setup for that, so any tips for
improving or finding the bottleneck are welcome!
I am running ASSP with Databases and a mysql-Server (max Database
connections is set to 5000) on an Ubuntu 10.04,Intel(R) Core(TM) i7
CPU 920 @ 2.67GHz, 8 cores server with 8 GB RAM.
ASSP Config is set with:
- HMMusesBDB: off,
- useDB4Rebuild: on
- useBerkeleyDB: on
- spamdb: DB:
- DBdriver: mysql
- (all) ThreadCycleTime: 0
- useDB4IntCache: off
Thanks for your support!
Michael
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test
DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally
privileged and protected in law and are intended solely for the use of the
individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no
known virus in this email!
*******************************************************
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test