Re: sa-learn using multiple CPUs?
On 2021-04-16 03:29, John Hardin wrote: So I will re-configure my installation to use MariaDB. You should also consider the Redis backend. i dont like to see redis needs sysctl non default settings so much more power does redis not have imho one could use memory engine in mysql, and then periodly dump to sql, or copy from memory to csv in mariadb, both memory engine and csv engine is very low mem frindly while still performing fast access maybe i am wroung, i just use postgresql
Re: sa-learn using multiple CPUs?
How hard is it to keep list mail on list and not reply directly to sender? Have you seen https://svn.apache.org/repos/asf/spamassassin/trunk/contrib/HOWTO.Bayes-Redis/ ? there may be some helpful info in there. On 4/16/21 9:47 AM, Christian Völker wrote: Thanks for the hint. I will monitor it. The machine has 16GB of memory which should be sufficient but I already notivce the preallocation of redis with 2GB. It is somehow unclear what happens. If there is no limit I will get an OOM errror and redis will (if killed) loose the last transactions after the last "save 900 1" snapshot, right? If I set a limit it will discard the oldest entries, correct? Both seems not to be perfect for Spamassassin. However, I will ignore the topic for the moment and see how it goes. 16GB shoud (hopefully) be enough. Once scanned the expired rules of Spamassassin should take place and reduce the amount of memory. Greetings /Christian Am 16.04.2021 um 09:15 schrieb Axb: To avoid suprises, remember to watch your memory usage. Redis reads/writes the DB in memory and only dumps to disk for backup. "redis-cli info" is of help On 4/16/21 9:10 AM, Christian Völker wrote: Sorry to annoy you. Another addition to my tests: When using redis it took me around 15seconds to scan ~1,500 messages. When using MariaDB it took one minute to do the same. With file based I had strange issues whatever lock type eI used (flock yes/no): "bayes: bayes db version 0 is not able to be used, aborting! at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 206." Anyways, now using Redis which appears to be the fastest. Thanks again! /Christian Am 16.04.2021 um 08:48 schrieb Christian Völker: Hi, So I will re-configure my installation to use MariaDB. You should also consider the Redis backend. Ok, had a look when using MariaDB and I monitored it for the last 24hrs. My 10 vCPUs where used, no I/O waits. But CPU usage overall was according to "top" only at 25% as top showed 75% idle. I assume there is some locking in place limiting the CPU usage. I configured it now to use Redis instead of MySQL and top tells me about 25% idle with 0% I/O waits when running 10 sa-learn in parallel. Increasing or decreasing the number of jobs does not significally change the idle percentage. So using redis the CPU usage is higher compared to MySQL. Thanks for ideas! /Christian
Re: sa-learn using multiple CPUs?
To avoid suprises, remember to watch your memory usage. Redis reads/writes the DB in memory and only dumps to disk for backup. "redis-cli info" is of help On 4/16/21 9:10 AM, Christian Völker wrote: Sorry to annoy you. Another addition to my tests: When using redis it took me around 15seconds to scan ~1,500 messages. When using MariaDB it took one minute to do the same. With file based I had strange issues whatever lock type eI used (flock yes/no): "bayes: bayes db version 0 is not able to be used, aborting! at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 206." Anyways, now using Redis which appears to be the fastest. Thanks again! /Christian Am 16.04.2021 um 08:48 schrieb Christian Völker: Hi, So I will re-configure my installation to use MariaDB. You should also consider the Redis backend. Ok, had a look when using MariaDB and I monitored it for the last 24hrs. My 10 vCPUs where used, no I/O waits. But CPU usage overall was according to "top" only at 25% as top showed 75% idle. I assume there is some locking in place limiting the CPU usage. I configured it now to use Redis instead of MySQL and top tells me about 25% idle with 0% I/O waits when running 10 sa-learn in parallel. Increasing or decreasing the number of jobs does not significally change the idle percentage. So using redis the CPU usage is higher compared to MySQL. Thanks for ideas! /Christian
Re: sa-learn using multiple CPUs?
Sorry to annoy you. Another addition to my tests: When using redis it took me around 15seconds to scan ~1,500 messages. When using MariaDB it took one minute to do the same. With file based I had strange issues whatever lock type eI used (flock yes/no): "bayes: bayes db version 0 is not able to be used, aborting! at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 206." Anyways, now using Redis which appears to be the fastest. Thanks again! /Christian Am 16.04.2021 um 08:48 schrieb Christian Völker: Hi, So I will re-configure my installation to use MariaDB. You should also consider the Redis backend. Ok, had a look when using MariaDB and I monitored it for the last 24hrs. My 10 vCPUs where used, no I/O waits. But CPU usage overall was according to "top" only at 25% as top showed 75% idle. I assume there is some locking in place limiting the CPU usage. I configured it now to use Redis instead of MySQL and top tells me about 25% idle with 0% I/O waits when running 10 sa-learn in parallel. Increasing or decreasing the number of jobs does not significally change the idle percentage. So using redis the CPU usage is higher compared to MySQL. Thanks for ideas! /Christian
Re: sa-learn using multiple CPUs?
Hi, So I will re-configure my installation to use MariaDB. You should also consider the Redis backend. Ok, had a look when using MariaDB and I monitored it for the last 24hrs. My 10 vCPUs where used, no I/O waits. But CPU usage overall was according to "top" only at 25% as top showed 75% idle. I assume there is some locking in place limiting the CPU usage. I configured it now to use Redis instead of MySQL and top tells me about 25% idle with 0% I/O waits when running 10 sa-learn in parallel. Increasing or decreasing the number of jobs does not significally change the idle percentage. So using redis the CPU usage is higher compared to MySQL. Thanks for ideas! /Christian
Re: sa-learn using multiple CPUs?
On Thu, 15 Apr 2021, Christian Völker wrote: Hi, so I did some testing. When using bayes_ files as backend and flock only a single process consumes CPU (strange, I have seen different behaviour before). When using MariaDB as backend all processes use CPU and share them with the MariaDB process. So I will re-configure my installation to use MariaDB. You should also consider the Redis backend. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.org pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Our politicians should bear in mind the fact that the American Revolution was touched off by the then-current government attempting to confiscate firearms from the people. --- 4 days until the 246th anniversary of The Shot Heard 'Round The World
Re: sa-learn using multiple CPUs?
Hi, so I did some testing. When using bayes_ files as backend and flock only a single process consumes CPU (strange, I have seen different behaviour before). When using MariaDB as backend all processes use CPU and share them with the MariaDB process. So I will re-configure my installation to use MariaDB. Thanks for your input! /Christian Am 15.04.2021 um 15:07 schrieb Henrik K: If you insist on file bayes, atleast make sure you use "lock_method flock". Or maybe BDB backend, don't remember if it's faster. On 4/15/21 2:45 PM, Christian Völker wrote: Hi, well, here it is not I/O bound (running on RAID1-SSDs). I am using the "default" file based backend ~/.spamassassin/bayes*. 40msg/sec is not really fast enough for me. The number of messages to be processed is really huge. So again asking: is it possible with the file-based dbackend to do this stuff in parallel? Thanks /Christian Am 15.04.2021 um 14:38 schrieb Axb: Depending on your Bayes backend, your bottleneck will not be the CPUs but I/O. Normally there's no need for running multiple sa-learn instances. My sa-learn is learning +40 msgs/sec from a SSD into a Redis DB. On 4/15/21 2:33 PM, Christian Völker wrote: Hi all, I am going to add some large spam archives for my Bayes database with sa-learn. I have a machine with six vCPUs and obviously I would like to speed up the learning process. I am thinking of running six sa-learn processes in parallel. Is there any issue with this like locks for the database? Or is sa-learn itself multithreaded and I do not need to run it in parallel (does not look so)? Next, when running the above in parallel (if possible) should I use the "--no-sync" and do the syncing afterwards? But again, this is then only single-threaded, right? Thanks a lot for your input! /Christian
Re: sa-learn using multiple CPUs?
If you insist on file bayes, atleast make sure you use "lock_method flock". Or maybe BDB backend, don't remember if it's faster. > On 4/15/21 2:45 PM, Christian Völker wrote: > > Hi, > > > > well, here it is not I/O bound (running on RAID1-SSDs). I am using the > > "default" file based backend ~/.spamassassin/bayes*. > > > > 40msg/sec is not really fast enough for me. The number of messages to be > > processed is really huge. > > > > So again asking: is it possible with the file-based dbackend to do this > > stuff in parallel? > > > > Thanks > > > > /Christian > > > > Am 15.04.2021 um 14:38 schrieb Axb: > > > Depending on your Bayes backend, your bottleneck will not be the > > > CPUs but I/O. > > > Normally there's no need for running multiple sa-learn instances. > > > > > > My sa-learn is learning +40 msgs/sec from a SSD into a Redis DB. > > > > > > On 4/15/21 2:33 PM, Christian Völker wrote: > > > > Hi all, > > > > > > > > I am going to add some large spam archives for my Bayes database > > > > with sa-learn. > > > > > > > > I have a machine with six vCPUs and obviously I would like to > > > > speed up the learning process. I am thinking of running six > > > > sa-learn processes in parallel. Is there any issue with this > > > > like locks for the database? > > > > > > > > Or is sa-learn itself multithreaded and I do not need to run it > > > > in parallel (does not look so)? > > > > > > > > Next, when running the above in parallel (if possible) should I > > > > use the "--no-sync" and do the syncing afterwards? But again, > > > > this is then only single-threaded, right? > > > > > > > > Thanks a lot for your input! > > > > > > > > /Christian > > > > > > > > > > > > > > > > >
Re: sa-learn using multiple CPUs?
Please keep list mail on list! if you run parallel sa-learn instances you'll run into locked DB errors. With a SDBM backend it would be a bit faster but still lock up. afaik, Redis backend won't have locking issues. (dunno about SQL - I use Redis) On 4/15/21 2:45 PM, Christian Völker wrote: Hi, well, here it is not I/O bound (running on RAID1-SSDs). I am using the "default" file based backend ~/.spamassassin/bayes*. 40msg/sec is not really fast enough for me. The number of messages to be processed is really huge. So again asking: is it possible with the file-based dbackend to do this stuff in parallel? Thanks /Christian Am 15.04.2021 um 14:38 schrieb Axb: Depending on your Bayes backend, your bottleneck will not be the CPUs but I/O. Normally there's no need for running multiple sa-learn instances. My sa-learn is learning +40 msgs/sec from a SSD into a Redis DB. On 4/15/21 2:33 PM, Christian Völker wrote: Hi all, I am going to add some large spam archives for my Bayes database with sa-learn. I have a machine with six vCPUs and obviously I would like to speed up the learning process. I am thinking of running six sa-learn processes in parallel. Is there any issue with this like locks for the database? Or is sa-learn itself multithreaded and I do not need to run it in parallel (does not look so)? Next, when running the above in parallel (if possible) should I use the "--no-sync" and do the syncing afterwards? But again, this is then only single-threaded, right? Thanks a lot for your input! /Christian
Re: sa-learn using multiple CPUs?
On Thu, Apr 15, 2021 at 08:39:42AM -0400, Greg Troxel wrote: > > I don't know, but beware that if you have TXREP configured, and you do > not use -L to sa-learn, I believe you will end up making DNSBL queries > for all of them. Thanks, TxRep actually seems to be the culprit. Will look into it.. https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7881
Re: sa-learn using multiple CPUs?
Hi, I don't know, but beware that if you have TXREP configured, and you do not use -L to sa-learn, I believe you will end up making DNSBL queries for all of them. Good catch! I did not use "-L" so far and I am pretty sure there is nothing configured but from reading then man page it will not do any harm. So I will add "-L". Besides of this a test run really cam up with 100% single CPU usage so I doubt it is doing the queries here. Thanks! /Christian
Re: sa-learn using multiple CPUs?
Christian Völker writes: > I am going to add some large spam archives for my Bayes database with > sa-learn. > > I have a machine with six vCPUs and obviously I would like to speed up > the learning process. I am thinking of running six sa-learn processes > in parallel. Is there any issue with this like locks for the database? I don't know, but beware that if you have TXREP configured, and you do not use -L to sa-learn, I believe you will end up making DNSBL queries for all of them. signature.asc Description: PGP signature
Re: sa-learn using multiple CPUs?
Depending on your Bayes backend, your bottleneck will not be the CPUs but I/O. Normally there's no need for running multiple sa-learn instances. My sa-learn is learning +40 msgs/sec from a SSD into a Redis DB. On 4/15/21 2:33 PM, Christian Völker wrote: Hi all, I am going to add some large spam archives for my Bayes database with sa-learn. I have a machine with six vCPUs and obviously I would like to speed up the learning process. I am thinking of running six sa-learn processes in parallel. Is there any issue with this like locks for the database? Or is sa-learn itself multithreaded and I do not need to run it in parallel (does not look so)? Next, when running the above in parallel (if possible) should I use the "--no-sync" and do the syncing afterwards? But again, this is then only single-threaded, right? Thanks a lot for your input! /Christian
sa-learn using multiple CPUs?
Hi all, I am going to add some large spam archives for my Bayes database with sa-learn. I have a machine with six vCPUs and obviously I would like to speed up the learning process. I am thinking of running six sa-learn processes in parallel. Is there any issue with this like locks for the database? Or is sa-learn itself multithreaded and I do not need to run it in parallel (does not look so)? Next, when running the above in parallel (if possible) should I use the "--no-sync" and do the syncing afterwards? But again, this is then only single-threaded, right? Thanks a lot for your input! /Christian