Re: sa-learn using multiple CPUs?

2021-04-16 Thread Benny Pedersen

On 2021-04-16 03:29, John Hardin wrote:


So I will re-configure my installation to use MariaDB.

You should also consider the Redis backend.


i dont like to see redis needs sysctl non default settings

so much more power does redis not have

imho one could use memory engine in mysql, and then periodly dump to 
sql, or copy from memory to csv in mariadb, both memory engine and csv 
engine is very low mem frindly while still performing fast access


maybe i am wroung, i just use postgresql


Re: sa-learn using multiple CPUs?

2021-04-16 Thread Axb

How hard is it to keep list mail on list and not reply directly to sender?

Have you seen
https://svn.apache.org/repos/asf/spamassassin/trunk/contrib/HOWTO.Bayes-Redis/ 
?


there may be some helpful info in there.

On 4/16/21 9:47 AM, Christian Völker wrote:
Thanks for the hint. I will monitor it. The machine has 16GB of memory 
which should be sufficient but I already notivce the preallocation of 
redis with 2GB.


It is somehow unclear what happens. If there is no limit I will get an 
OOM errror and redis will (if killed) loose the last transactions after 
the last "save 900 1" snapshot, right?


If I set a limit it will discard the oldest entries, correct?

Both seems not to be perfect for Spamassassin.

However, I will ignore the topic for the moment and see how it goes. 
16GB shoud (hopefully) be enough. Once scanned the expired rules of 
Spamassassin should take place and reduce the amount of memory.


Greetings

/Christian




Am 16.04.2021 um 09:15 schrieb Axb:

To avoid suprises, remember to watch your memory usage.
Redis reads/writes the DB in memory and only dumps to disk for backup.

"redis-cli info" is of help


On 4/16/21 9:10 AM, Christian Völker wrote:

Sorry to annoy you. Another addition to my tests:

When using redis it took me around 15seconds to scan ~1,500 messages.
When using MariaDB it took one minute to do the same.
With file based I had strange issues whatever lock type eI used 
(flock yes/no):
"bayes: bayes db version 0 is not able to be used, aborting! at 
/usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 206."



Anyways, now using Redis which appears to be the fastest.

Thanks again!

/Christian



Am 16.04.2021 um 08:48 schrieb Christian Völker:

Hi,

So I will re-configure my installation to use MariaDB.

You should also consider the Redis backend.


Ok, had a look when using MariaDB and I monitored it for the last 
24hrs. My 10 vCPUs where used, no I/O waits. But CPU usage overall 
was according to "top" only at 25% as top showed 75% idle. I assume 
there is some locking in place limiting the CPU usage.


I configured it now to use Redis instead of MySQL and top tells me 
about 25% idle with 0% I/O waits when running 10 sa-learn in 
parallel. Increasing or decreasing the number of jobs does not 
significally change the idle percentage.


So using redis the CPU usage is higher compared to MySQL.

Thanks for ideas!

/Christian













Re: sa-learn using multiple CPUs?

2021-04-16 Thread Axb

To avoid suprises, remember to watch your memory usage.
Redis reads/writes the DB in memory and only dumps to disk for backup.

"redis-cli info" is of help


On 4/16/21 9:10 AM, Christian Völker wrote:

Sorry to annoy you. Another addition to my tests:

When using redis it took me around 15seconds to scan ~1,500 messages.
When using MariaDB it took one minute to do the same.
With file based I had strange issues whatever lock type eI used (flock 
yes/no):
"bayes: bayes db version 0 is not able to be used, aborting! at 
/usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 206."



Anyways, now using Redis which appears to be the fastest.

Thanks again!

/Christian



Am 16.04.2021 um 08:48 schrieb Christian Völker:

Hi,

So I will re-configure my installation to use MariaDB.

You should also consider the Redis backend.


Ok, had a look when using MariaDB and I monitored it for the last 
24hrs. My 10 vCPUs where used, no I/O waits. But CPU usage overall was 
according to "top" only at 25% as top showed 75% idle. I assume there 
is some locking in place limiting the CPU usage.


I configured it now to use Redis instead of MySQL and top tells me 
about 25% idle with 0% I/O waits when running 10 sa-learn in parallel. 
Increasing or decreasing the number of jobs does not significally 
change the idle percentage.


So using redis the CPU usage is higher compared to MySQL.

Thanks for ideas!

/Christian








Re: sa-learn using multiple CPUs?

2021-04-16 Thread Christian Völker

Sorry to annoy you. Another addition to my tests:

When using redis it took me around 15seconds to scan ~1,500 messages.
When using MariaDB it took one minute to do the same.
With file based I had strange issues whatever lock type eI used (flock 
yes/no):
"bayes: bayes db version 0 is not able to be used, aborting! at 
/usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 206."



Anyways, now using Redis which appears to be the fastest.

Thanks again!

/Christian



Am 16.04.2021 um 08:48 schrieb Christian Völker:

Hi,

So I will re-configure my installation to use MariaDB.

You should also consider the Redis backend.


Ok, had a look when using MariaDB and I monitored it for the last 
24hrs. My 10 vCPUs where used, no I/O waits. But CPU usage overall was 
according to "top" only at 25% as top showed 75% idle. I assume there 
is some locking in place limiting the CPU usage.


I configured it now to use Redis instead of MySQL and top tells me 
about 25% idle with 0% I/O waits when running 10 sa-learn in parallel. 
Increasing or decreasing the number of jobs does not significally 
change the idle percentage.


So using redis the CPU usage is higher compared to MySQL.

Thanks for ideas!

/Christian





Re: sa-learn using multiple CPUs?

2021-04-15 Thread Christian Völker

Hi,

So I will re-configure my installation to use MariaDB.

You should also consider the Redis backend.


Ok, had a look when using MariaDB and I monitored it for the last 24hrs. 
My 10 vCPUs where used, no I/O waits. But CPU usage overall was 
according to "top" only at 25% as top showed 75% idle. I assume there is 
some locking in place limiting the CPU usage.


I configured it now to use Redis instead of MySQL and top tells me about 
25% idle with 0% I/O waits when running 10 sa-learn in parallel. 
Increasing or decreasing the number of jobs does not significally change 
the idle percentage.


So using redis the CPU usage is higher compared to MySQL.

Thanks for ideas!

/Christian



Re: sa-learn using multiple CPUs?

2021-04-15 Thread John Hardin

On Thu, 15 Apr 2021, Christian Völker wrote:


Hi,

so I did some testing.

When using bayes_ files as backend and flock only a single process consumes 
CPU (strange, I have seen different behaviour before).
When using MariaDB as backend all processes use CPU and share them with the 
MariaDB process.


So I will re-configure my installation to use MariaDB.


You should also consider the Redis backend.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Our politicians should bear in mind the fact that
  the American Revolution was touched off by the then-current
  government attempting to confiscate firearms from the people.
---
 4 days until the 246th anniversary of The Shot Heard 'Round The World

Re: sa-learn using multiple CPUs?

2021-04-15 Thread Christian Völker

Hi,

so I did some testing.

When using bayes_ files as backend and flock only a single process 
consumes CPU (strange, I have seen different behaviour before).
When using MariaDB as backend all processes use CPU and share them with 
the MariaDB process.


So I will re-configure my installation to use MariaDB.


Thanks for your input!

/Christian



Am 15.04.2021 um 15:07 schrieb Henrik K:

If you insist on file bayes, atleast make sure you use "lock_method flock".
Or maybe BDB backend, don't remember if it's faster.


On 4/15/21 2:45 PM, Christian Völker wrote:

Hi,

well, here it is not I/O bound (running on RAID1-SSDs). I am using the
"default" file based backend ~/.spamassassin/bayes*.

40msg/sec is not really fast enough for me. The number of messages to be
processed is really huge.

So again asking: is it possible with the file-based dbackend to do this
stuff in parallel?

Thanks

/Christian

Am 15.04.2021 um 14:38 schrieb Axb:

Depending on your Bayes backend, your bottleneck will not be the
CPUs but I/O.
Normally there's no need for running multiple sa-learn instances.

My sa-learn is learning +40 msgs/sec from a SSD into a Redis DB.

On 4/15/21 2:33 PM, Christian Völker wrote:

Hi all,

I am going to add some large spam archives for my Bayes database
with sa-learn.

I have a machine with six vCPUs and obviously I would like to
speed up the learning process. I am thinking of running six
sa-learn processes in parallel. Is there any issue with this
like locks for the database?

Or is sa-learn itself multithreaded and I do not need to run it
in parallel (does not look so)?

Next, when running the above in parallel (if possible) should I
use the "--no-sync" and do the syncing afterwards? But again,
this is then only single-threaded, right?

Thanks a lot for your input!

/Christian








Re: sa-learn using multiple CPUs?

2021-04-15 Thread Henrik K


If you insist on file bayes, atleast make sure you use "lock_method flock". 
Or maybe BDB backend, don't remember if it's faster.

> On 4/15/21 2:45 PM, Christian Völker wrote:
> > Hi,
> > 
> > well, here it is not I/O bound (running on RAID1-SSDs). I am using the
> > "default" file based backend ~/.spamassassin/bayes*.
> > 
> > 40msg/sec is not really fast enough for me. The number of messages to be
> > processed is really huge.
> > 
> > So again asking: is it possible with the file-based dbackend to do this
> > stuff in parallel?
> > 
> > Thanks
> > 
> > /Christian
> > 
> > Am 15.04.2021 um 14:38 schrieb Axb:
> > > Depending on your Bayes backend, your bottleneck will not be the
> > > CPUs but I/O.
> > > Normally there's no need for running multiple sa-learn instances.
> > > 
> > > My sa-learn is learning +40 msgs/sec from a SSD into a Redis DB.
> > > 
> > > On 4/15/21 2:33 PM, Christian Völker wrote:
> > > > Hi all,
> > > > 
> > > > I am going to add some large spam archives for my Bayes database
> > > > with sa-learn.
> > > > 
> > > > I have a machine with six vCPUs and obviously I would like to
> > > > speed up the learning process. I am thinking of running six
> > > > sa-learn processes in parallel. Is there any issue with this
> > > > like locks for the database?
> > > > 
> > > > Or is sa-learn itself multithreaded and I do not need to run it
> > > > in parallel (does not look so)?
> > > > 
> > > > Next, when running the above in parallel (if possible) should I
> > > > use the "--no-sync" and do the syncing afterwards? But again,
> > > > this is then only single-threaded, right?
> > > > 
> > > > Thanks a lot for your input!
> > > > 
> > > > /Christian
> > > > 
> > > > 
> > > 
> > > 
> > 
> 


Re: sa-learn using multiple CPUs?

2021-04-15 Thread Axb

Please keep list mail on list!
if you run parallel sa-learn instances you'll run into locked DB errors.
With a SDBM backend it would be a bit faster but still lock up.
afaik, Redis backend won't have locking issues.
(dunno about SQL - I use Redis)

On 4/15/21 2:45 PM, Christian Völker wrote:

Hi,

well, here it is not I/O bound (running on RAID1-SSDs). I am using the 
"default" file based backend ~/.spamassassin/bayes*.


40msg/sec is not really fast enough for me. The number of messages to be 
processed is really huge.


So again asking: is it possible with the file-based dbackend to do this 
stuff in parallel?


Thanks

/Christian

Am 15.04.2021 um 14:38 schrieb Axb:
Depending on your Bayes backend, your bottleneck will not be the CPUs 
but I/O.

Normally there's no need for running multiple sa-learn instances.

My sa-learn is learning +40 msgs/sec from a SSD into a Redis DB.

On 4/15/21 2:33 PM, Christian Völker wrote:

Hi all,

I am going to add some large spam archives for my Bayes database with 
sa-learn.


I have a machine with six vCPUs and obviously I would like to speed 
up the learning process. I am thinking of running six sa-learn 
processes in parallel. Is there any issue with this like locks for 
the database?


Or is sa-learn itself multithreaded and I do not need to run it in 
parallel (does not look so)?


Next, when running the above in parallel (if possible) should I use 
the "--no-sync" and do the syncing afterwards? But again, this is 
then only single-threaded, right?


Thanks a lot for your input!

/Christian












Re: sa-learn using multiple CPUs?

2021-04-15 Thread Henrik K
On Thu, Apr 15, 2021 at 08:39:42AM -0400, Greg Troxel wrote:
>
> I don't know, but beware that if you have TXREP configured, and you do
> not use -L to sa-learn, I believe you will end up making DNSBL queries
> for all of them.

Thanks, TxRep actually seems to be the culprit.  Will look into it..

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7881



Re: sa-learn using multiple CPUs?

2021-04-15 Thread Christian Völker

Hi,


I don't know, but beware that if you have TXREP configured, and you do
not use -L to sa-learn, I believe you will end up making DNSBL queries
for all of them.


Good catch! I did not use "-L" so far and I am pretty sure there is 
nothing configured but from reading then man page it will not do any 
harm. So I will add "-L".


Besides of this a test run really cam up with 100% single CPU usage so I 
doubt it is doing the queries here.


Thanks!

/Christian



Re: sa-learn using multiple CPUs?

2021-04-15 Thread Greg Troxel

Christian Völker  writes:

> I am going to add some large spam archives for my Bayes database with
> sa-learn.
>
> I have a machine with six vCPUs and obviously I would like to speed up
> the learning process. I am thinking of running six sa-learn processes
> in parallel. Is there any issue with this like locks for the database?

I don't know, but beware that if you have TXREP configured, and you do
not use -L to sa-learn, I believe you will end up making DNSBL queries
for all of them.


signature.asc
Description: PGP signature


Re: sa-learn using multiple CPUs?

2021-04-15 Thread Axb
Depending on your Bayes backend, your bottleneck will not be the CPUs 
but I/O.

Normally there's no need for running multiple sa-learn instances.

My sa-learn is learning +40 msgs/sec from a SSD into a Redis DB.

On 4/15/21 2:33 PM, Christian Völker wrote:

Hi all,

I am going to add some large spam archives for my Bayes database with 
sa-learn.


I have a machine with six vCPUs and obviously I would like to speed up 
the learning process. I am thinking of running six sa-learn processes in 
parallel. Is there any issue with this like locks for the database?


Or is sa-learn itself multithreaded and I do not need to run it in 
parallel (does not look so)?


Next, when running the above in parallel (if possible) should I use the 
"--no-sync" and do the syncing afterwards? But again, this is then only 
single-threaded, right?


Thanks a lot for your input!

/Christian







sa-learn using multiple CPUs?

2021-04-15 Thread Christian Völker

Hi all,

I am going to add some large spam archives for my Bayes database with 
sa-learn.


I have a machine with six vCPUs and obviously I would like to speed up 
the learning process. I am thinking of running six sa-learn processes in 
parallel. Is there any issue with this like locks for the database?


Or is sa-learn itself multithreaded and I do not need to run it in 
parallel (does not look so)?


Next, when running the above in parallel (if possible) should I use the 
"--no-sync" and do the syncing afterwards? But again, this is then only 
single-threaded, right?


Thanks a lot for your input!

/Christian