Re: Spoofed amazon order email

2021-04-16 Thread Bill Cole
On 16 Apr 2021, at 11:25, Greg Troxel wrote:

>   Probably not for normals, score up MPART_ALT_DIFF because nobody
>   should be sending mail with a text/plain part that is not semantically
>   equivalent to the html.

It seem like a bug that this message didn't match MPART_ALT_DIFF.

-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Spoofed amazon order email

2021-04-16 Thread @lbutlr
On 16 Apr 2021, at 16:16, RW  wrote:
> On Fri, 16 Apr 2021 11:25:19 -0400 Greg Troxel wrote:
> 
>>  Probably not for normals, score up MPART_ALT_DIFF because nobody
>>  should be sending mail with a text/plain part that is not
>>  semantically equivalent to the html.
> 
> Unfortunately it's quite common. 

Yep. Often the plain text part is just a URL to the page containing the html 
version of the attachment, and this is not a particularly good spam indicator, 
sadly. In fact, it might be a counter indicator.

-- 
I can't die, I haven't seen The Jolson Story



Re: Spoofed amazon order email

2021-04-16 Thread John Hardin

On Fri, 16 Apr 2021, RW wrote:


On Fri, 16 Apr 2021 11:25:19 -0400
Greg Troxel wrote:


  Probably not for normals, score up MPART_ALT_DIFF because nobody
  should be sending mail with a text/plain part that is not
  semantically equivalent to the html.


Unfortunately it's quite common.


+1 {fume}

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Our politicians should bear in mind the fact that
  the American Revolution was touched off by the then-current
  government attempting to confiscate firearms from the people.
---
 3 days until the 246th anniversary of The Shot Heard 'Round The World


Re: Spoofed amazon order email

2021-04-16 Thread @lbutlr
On 16 Apr 2021, at 16:03, John Hardin  wrote:
>   header __FROM_NAME_AMAZONCOM From:name =~ /\bamazon\.com\b/i
>   meta   POSSIBLE_AMAZON_PHISH_01  (__FROM_NAME_AMAZONCOM && NAME_EMAIL_DIFF)
>   meta   POSSIBLE_AMAZON_PHISH_02  (__FROM_NAME_AMAZONCOM && 
> !__HDR_RCVD_AMAZON)

It seems something like this should be built in for sites like amazon.com 
PayPal.com google.com apple.com citi.com, etc etc.

Not gmail,. Of course, it would fail spectacularly if used for that, but for 
stores and banks and such, it seems like this is bloody obvious. Probably a 
score 0.01 for POSSIBLE_AMAZON_PHISH_01, but I don't see anything wrong with a 
killshot 5.0 for POSSIBLE_AMAZON_PHISH_02. (Not that I am testing it with a 5.0 
score, but I sure expect to see a score around there).

-- 
Hamburgers. The cornerstone of any nutritious breakfast.



Re: Spoofed amazon order email

2021-04-16 Thread Loren Wilton
While I haven't received a forged Amazon order email in this exact form, 
there is all kinds of stuff here that could be caught with appropriate 
rules.


   "In-case you require any
   change in order or like to cancel we recommend giving us call
   immediately at "

"In-case" is unlikely in mail, there should be no dash there.
"giving us call" is missing "a" and is bad grammer, but typical of 
non-English speaking spam.

"In case you require any change in order" is also poor phrasing.
The whole "call us immediately to change your order" concept rates 3 points 
on my mail system.

No phrase of any similar sort appears in a real Amazon order confirmation.


An actual Amazon order has a subject of the form

   Subject: Your Amazon.com order #114-2489974-7888243

The Subject here is

   Subject: IVK-1250703-9254770 | Apple Watch Series 6 Order Now Confirmed

The order number is in the wrong format.
The order number is in the wrong place in the subject text
The subject text is in the wrong format.


An actual Amazon order confirmation has the headers, in this order:

   From: "Amazon.com" 
   Reply-To: no-re...@amazon.com
   To: 
   Message-ID: <010001774af541dc-d38f4184-621e-4014-a295-c520285ae319-00 
0...@email.amazonses.com>

   Subject: Your Amazon.com order #114-2489974-7888242

This mail has

From: "or...@amazon.com" 
   X-Google-Original-From: "or...@amazon.com" 
   Content-Type: multipart/alternative;
   boundary="===2707982310301423984=="
   MIME-Version: 1.0
   Subject: IVK-1250703-9254770 | Apple Watch Series 6 Order Now Confirmed
   To: s...@dondley.com

The header order is completely different.
There is no Reply-To header
The From address is completely wrong.
There should be no X-Google-* headers.


There should also be a header:

   X-AMAZON-MAIL-RELAY-TYPE: notification

A real Amazon order receipt has Content-Type = multipart/alternative, but it 
only contains a text/plain part encoded in QP, with no HTML part. This 
message has an HTML part and should be getting MPART_ALT_DIFF.




   "This email was sent from a
   customer service address kindly write us back if you have any concern. "

This is bad grammar and a very unlikely form of robot sending account 
notice. A real Amazon order contains


   "This email was sent from a notification-only address that cannot accept 
inc=

   oming email. Please do not reply to this message."

This is a very stasndard phrasing for this sort of notice.


A real Amazon order confirmation does not contain an "unsubscribe" link. 
This phish does.



There is a lot of other stuff that could be caught by various rules, but a 
trivial set would be something like


#---
# 04/16/2021
# A bunch of rules to try to catch fake Amazon order confirmations, based on 
a

# message pasted to the SA Users list.

header __LW_SUB_AMZ_ORDER Subject =~ /^Your Amazon\.com order 
\#\d{3}-\d{7}-\d{7}\s*$/
header __LW_FROM_AMZ_ORDER From =~ 
/\"Amazon\.com\"\s+/

header __LW_REP_AMZ_ORDER Reply-To =~ /^no-reply\@amazon\.com\s*$/
body __LW_BODY_AMZ_ORDER /Amazon.com Order Confirmation/

meta LW_REAL_AMZ_ORDER __LW_SUB_AMZ_ORDER && __LW_FROM_AMZ_ORDER && 
__LW_REP_AMZ_ORDER && __LW_BODY_AMZ_ORDER

score LW_REAL_AMZ_ORDER -2
describe LW_REAL_AMZ_ORDER Amazon order confirmation

header __LW_FROM_AMZ From =~ /\bamazon\b/i
header __LW_SUB_ORDER Subject =~ /\border\b/i

meta LW_FAKE_AMZ_ORDER __LW_FROM_AMZ && __LW_SUB_ORDER && !LW_REAL_AMZ_ORDER
score LW_FAKE_AMZ_ORDER 7
describe LW_FAKE_AMZ_ORDER Amazon order phish

You might also like

body LW_PAYMENT /You\s+sent\s+a\s+Payment\s+of/i
score LW_PAYMENT 0.5
describe LW_PAYMENT You sent someone a payment

body LW_ORDER /\b(?:order|purchase)\s+(?:number|ID|date|description)\b/i
score LW_ORDER 0.5
describe LW_ORDER Contains order information
?
meta LW_FREEMAIL_ORDER FREEMAIL_FROM && (LW_ORDER || LW_PAYMENT)
score LW_FREEMAIL_ORDER 4
describe LW_FREEMAIL_ORDER An order receipt from a free email address
? 



Re: Spoofed amazon order email

2021-04-16 Thread RW
On Fri, 16 Apr 2021 11:25:19 -0400
Greg Troxel wrote:

>   Probably not for normals, score up MPART_ALT_DIFF because nobody
>   should be sending mail with a text/plain part that is not
>   semantically equivalent to the html.
 
Unfortunately it's quite common. 


Re: Spoofed amazon order email

2021-04-16 Thread John Hardin

On Fri, 16 Apr 2021, Steve Dondley wrote:

First, thanks to everyone on the list how has given me a hand over the past 
couple of weeks as I get my "sea legs" with spamassassin. It's working well 
for me now but I obviously still have more to learn.


For one, I'm still uncertain on the best way to fine tune SA to beat back 
some tricky spam. Like this one that comes from a gmail account but spoofs a 
fake, expensive order on amazon to try to phish the user.



This is telling:

From: "or...@amazon.com" 

...and it's detected:

0.9 NAME_EMAIL_DIFFSender NAME is an unrelated email address

...but the score is low due to that happening a lot in legit email, so we 
need tighter focus.


I'll add this to my sandbox and see how it does:

   header __FROM_NAME_AMAZONCOM From:name =~ /\bamazon\.com\b/i
   meta   POSSIBLE_AMAZON_PHISH_01  (__FROM_NAME_AMAZONCOM && NAME_EMAIL_DIFF)
   meta   POSSIBLE_AMAZON_PHISH_02  (__FROM_NAME_AMAZONCOM && 
!__HDR_RCVD_AMAZON)

You are welcome to add it to your local config. Potentially other 
variations would be useful.


   -0.0 BAYES_20   BODY: Bayes spam probability is 5 to 20%

Train your Bayes...

What is this?

   0.0 GB_FROM_NAME_FREEMAIL  Freemail spear phish with free mail

Is that local? If not, you might want to increase the score on that a bit. 
Giovanni, is that something of yours that's not in your SA sandbox?




--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Our politicians should bear in mind the fact that
  the American Revolution was touched off by the then-current
  government attempting to confiscate firearms from the people.
---
 3 days until the 246th anniversary of The Shot Heard 'Round The World


Re: sa-learn using multiple CPUs?

2021-04-16 Thread Benny Pedersen

On 2021-04-16 03:29, John Hardin wrote:


So I will re-configure my installation to use MariaDB.

You should also consider the Redis backend.


i dont like to see redis needs sysctl non default settings

so much more power does redis not have

imho one could use memory engine in mysql, and then periodly dump to 
sql, or copy from memory to csv in mariadb, both memory engine and csv 
engine is very low mem frindly while still performing fast access


maybe i am wroung, i just use postgresql


Re: Spoofed amazon order email

2021-04-16 Thread Benny Pedersen

On 2021-04-16 17:10, Steve Dondley wrote:


From: "or...@amazon.com" 
X-Google-Original-From: "or...@amazon.com" 


wow, google accept it

header LOCAL_AMAZON From:Name ~= /^@amazon.com$/
header LOCAL_GMAIL From:Addr ~= /^@gmail.com$/

meta LOCAL_SPOFFED (LocAL_AMAZON && LOCAL_GMAIL)

untested but just writed as i remember how to :=)

the X-Google-Original-From is silly accept it

i bet there is no real name in this world that includes a @


Re: Spoofed amazon order email

2021-04-16 Thread Antony Stone
On Friday 16 April 2021 at 17:26:40, Dave Wreski wrote:

> > And how the hell is google letting this crap flow out of its email
> > service, anyway?
> 
> Because they're in the email business, not the email security business.

I would add that Google do spam filtering on *inbound* mail, because that means 
they can tell their users (customers) that Google is protecting them.

For *outbound* email going to the rest of the world, that's their (rest of 
world) lookout.


Antony.

-- 
I thought I had type A blood, but it turned out to be a typo.

   Please reply to the list;
 please *don't* CC me.


Re: Spoofed amazon order email

2021-04-16 Thread Dave Wreski

Hi Steve,

As Antony just reported, post these spamples to something like 
pastebin.com then provide a link so we can view the raw email.


X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on 


This is the first issue I see - you're likely missing a lot of 
additional features of later versions, as well as regular updates.



From: "or...@amazon.com" 


I believe this mismatch would also be caught with later versions.


-0.0 BAYES_20   BODY: Bayes spam probability is 5 to 20%


This is the other big issue - you need to train these to recognize this 
as spam/phishing. You can also go through your quarantine to find spam 
that hasn't been properly trained to use as a corpus.


And how the hell is google letting this crap flow out of its email 
service, anyway?


Because they're in the email business, not the email security business.

Go here and make sure you're using the KAM channel (as well as the 
regular sa-updates channel).

https://mcgrail.com/template/kam.cf_channel

Best,
Dave


Re: Spoofed amazon order email

2021-04-16 Thread Greg Troxel

My advice

  realize that you can't block everything

  set up TXREP, including outgoing processing

  wait until after you have a week of TXREP data because that will
  improve scores of legit mail enough, for the most part, that the
  tweaks below and the more aggressive scores from KAM will not hurt.  I
  had misfiling (technically not given the 5.0 points doctrine) from
  some of the KAM rules.  But with TXREP, they don't cause problems.

  tweak scores up or rules that hit on this mail like NAME_EMAIL_DIFF,
  GB_FROM_NAME_FREEMAIL, FREEMAIL_FROM, and FREEMAIL_ENVFROM_END_DIGIT

  Use the KAM rules, and then be prepared to maybe downweight some if
  they cause you issues (e.g. KAM_UNIV at 4.5 is too aggressive for me
  as a single rule that has fired on ham, but I'm ok with 3).  But with
  TXREP trained a little, I'd be surprised if you see real problems.

  Probably not for normals, score up MPART_ALT_DIFF because nobody
  should be sending mail with a text/plain part that is not semantically
  equivalent to the html.


It may be controversial to score up freemail.  But I find that if it's
not somebody I correspond with (TXREP helps here), the the probability
of mail from gmail being spam is pretty high.   I dont' mean that gmail
emits mostly spam - just that after you set aside mail from people you
deal with, the ratio of "legit mail from a previously unknown
correspondent" to "spam' is not that high.

And I find gmail being in H3 to be wrong, but not my BL to run :-)
This is a difference in view between "very little of the mail is spam"
an "very little of the previously-unknown sender mail is spam".



signature.asc
Description: PGP signature


Re: Spoofed amazon order email

2021-04-16 Thread Antony Stone
On Friday 16 April 2021 at 17:10:14, Steve Dondley wrote:

> First, thanks to everyone on the list how has given me a hand over the
> past couple of weeks as I get my "sea legs" with spamassassin. It's
> working well for me now but I obviously still have more to learn.
> 
> For one, I'm still uncertain on the best way to fine tune SA to beat
> back some tricky spam. Like this one that comes from a gmail account but
> spoofs a fake, expensive order on amazon to try to phish the user.

Not an answer to your question, but a piece of advice about asking questions 
like this:

Don't paste the (suspect) spam email into what you post to the list:

1. The formatting may get corrupted either by your sending mail client or by 
recipients' mail clients, making it hard to read accurately

2. Many people on this list run spam filters (!) meaning that your posting may 
not reach them at all, because of its content

Far better to put the suspect mail onto pastebin.com or similar and then 
provide a link to that on this list.

Regards,


Antony.

-- 
Heisenberg, Gödel, and Chomsky walk in to a bar.
Heisenberg says, "Clearly this is a joke, but how can we work out if it's 
funny or not?"
Gödel replies, "We can't know that because we're inside the joke."
Chomsky says, "Of course it's funny. You're just saying it wrong."

   Please reply to the list;
 please *don't* CC me.


Spoofed amazon order email

2021-04-16 Thread Steve Dondley
First, thanks to everyone on the list how has given me a hand over the 
past couple of weeks as I get my "sea legs" with spamassassin. It's 
working well for me now but I obviously still have more to learn.


For one, I'm still uncertain on the best way to fine tune SA to beat 
back some tricky spam. Like this one that comes from a gmail account but 
spoofs a fake, expensive order on amazon to try to phish the user.


Return-Path: 
Delivered-To: s...@dondley.com
Received: from email.dondley.com
by email.dondley.com with LMTP
id Ev9rGkyheWBeegAAB604Gw
(envelope-from )
for ; Fri, 16 Apr 2021 10:38:04 -0400
Received: by email.dondley.com (Postfix, from userid 115)
id 5EFD521516; Fri, 16 Apr 2021 10:38:04 -0400 (EDT)
Authentication-Results: email.dondley.com;
	dkim=pass (2048-bit key; unprotected) header.d=gmail.com 
header.i=@gmail.com header.b="Fi/GiyLT";

dkim-atps=neutral
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on 
email.dondley.com

X-Spam-Level:
X-Spam-Status: No, score=0.9 required=5.0 tests=BAYES_20,DKIM_SIGNED,
DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,GB_FROM_NAME_FREEMAIL,
HTML_MESSAGE,MIME_HTML_MOSTLY,NAME_EMAIL_DIFF,RCVD_IN_DNSWL_NONE,
RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS
shortcircuit=no autolearn=no autolearn_force=no version=3.4.2
X-Spam-Language: en
Received-SPF: Pass (mailfrom) identity=mailfrom; 
client-ip=209.85.216.54; helo=mail-pj1-f54.google.com; 
envelope-from=gk5751...@gmail.com; receiver=
Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com 
[209.85.216.54])

by email.dondley.com (Postfix) with ESMTPS id 9DFB9210C1
for ; Fri, 16 Apr 2021 10:37:53 -0400 (EDT)
Received: by mail-pj1-f54.google.com with SMTP id 
kb13-20020a17090ae7cdb02901503d67f0beso3185770pjb.0

for ; Fri, 16 Apr 2021 07:37:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20161025;
h=message-id:date:from:mime-version:subject:to;
bh=tbWgclEtavQLHj3b2u0ycLuH4u7X12CkOv+d/W8zWrs=;

b=Fi/GiyLThBU+Sf1M8Thsh4lWYqGeC2mX1d6uL+5grFufl8EA68jtMePxe1TsIetKPj
 
oCRdmdkjvxAGFA0Uny2lttK9Xhpmoa38zO0rLmFLN+tzKTHYuKKoiQx6ugByfCpk6A82
 
QDyDgRp7HpEkA34ztYXqR9Q0MH8eTPPaK7iNTbdq2Sb78PYR+XNX9UVDnWarVSmlQm6N
 
EwrQKnzaaT4WKuUrmXS8tkGJMLLfWxLQAu0oCxbKwDkjW7yLMVYGl1Zhk7tNjoi2Hk2r
 
xywZ0v6AyAbSTawCrUN052ps4xjKR/o0CLHrkk+FLbu9wENYbhrDNb/HMRu20aTzEgHn

 AvZA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20161025;

h=x-gm-message-state:message-id:date:from:mime-version:subject:to;

bh=tbWgclEtavQLHj3b2u0ycLuH4u7X12CkOv+d/W8zWrs=;

b=D4cfDeHF3n8JokVklJNHvyFD04InVRxq/DLHtB+xrMenRQZDQPHMqH5KdJBAgs4hAD
 
hc1YTl90K8wFUUAicyyzwhAzBTJqqCtmOZJczjjoXj9WXxEBqiJvgB5m2H+UvTejEX/0
 
AA/Exf6uvfuGP5hsrp7o4i22DBc/FlZDVArJt7wN+u+zjO1+rRFgrfbW6fdWzgYkb6Y2
 
jV/JTQywhNxSY6XaOSd4AA1i9ZC8LOaqkOLabUy1WI7uEWDOvzaO4MZuBzHi23vmdHlA
 
weh507+u6rXpN6BarAXZEZxnC+yev86JRqtQjJZL5qTpbjhb2s/1g6wSeRNF1Ri7qIXs

 zbfA==
X-Gm-Message-State: 
AOAM5322u+9pAxfsMRqYaM8FgbXE+0nBCEZeqd286+mfRDrabuuIhCVe

CLSzPPcNsg+v2Px14I1WF9r5vuoVLtg=
X-Google-Smtp-Source: 
ABdhPJw1ixhEhS6bCqFtjizgrTxFo6mCL1fEQPBSzQxIDGkIqIwR7np7Mgjy6ap0Lx6VHje5LfeKwQ==
X-Received: by 2002:a17:90a:5407:: with SMTP id 
z7mr10416174pjh.228.1618583872037;

Fri, 16 Apr 2021 07:37:52 -0700 (PDT)
Received: from 
1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa 
([104.143.92.92])
by smtp.gmail.com with ESMTPSA id 
t15sm5203451pgh.33.2021.04.16.07.37.49

for 
(version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
Fri, 16 Apr 2021 07:37:51 -0700 (PDT)
Message-ID: <6079a13f.1c69fb81.a9651.e...@mx.google.com>
Date: Fri, 16 Apr 2021 07:37:51 -0700 (PDT)
From: "or...@amazon.com" 
X-Google-Original-From: "or...@amazon.com" 
Content-Type: multipart/alternative; 
boundary="===2707982310301423984=="

MIME-Version: 1.0
Subject: IVK-1250703-9254770 | Apple Watch Series 6 Order Now Confirmed
To: s...@dondley.com

--===2707982310301423984==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit

Hello there, S!

This is a test template...

--===2707982310301423984==
Content-Type: text/html; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit




href="https://go.pardot.com/unsubscribe/u/272832/9445773a5f7e92b64a4b106d30d12be4ec08e6d19850125ed1a094fe7f00100f/734801457"; 
target="_blank">List-Unsubscribe


cellspacing="0" cellpadding="0" align="center">













Your Order  | Your 
Account | Amazon.com
ORDER NUMBER
# 
IVK-1250703-9254770













Dear 
S
Thank you for shopping 
with us. You have ordered the Apple Watch Series 6 Space Gray 44 mm GPS + Cellular
In-case you require any 
change in order or like to ca

Re: sa-learn using multiple CPUs?

2021-04-16 Thread Axb

How hard is it to keep list mail on list and not reply directly to sender?

Have you seen
https://svn.apache.org/repos/asf/spamassassin/trunk/contrib/HOWTO.Bayes-Redis/ 
?


there may be some helpful info in there.

On 4/16/21 9:47 AM, Christian Völker wrote:
Thanks for the hint. I will monitor it. The machine has 16GB of memory 
which should be sufficient but I already notivce the preallocation of 
redis with 2GB.


It is somehow unclear what happens. If there is no limit I will get an 
OOM errror and redis will (if killed) loose the last transactions after 
the last "save 900 1" snapshot, right?


If I set a limit it will discard the oldest entries, correct?

Both seems not to be perfect for Spamassassin.

However, I will ignore the topic for the moment and see how it goes. 
16GB shoud (hopefully) be enough. Once scanned the expired rules of 
Spamassassin should take place and reduce the amount of memory.


Greetings

/Christian




Am 16.04.2021 um 09:15 schrieb Axb:

To avoid suprises, remember to watch your memory usage.
Redis reads/writes the DB in memory and only dumps to disk for backup.

"redis-cli info" is of help


On 4/16/21 9:10 AM, Christian Völker wrote:

Sorry to annoy you. Another addition to my tests:

When using redis it took me around 15seconds to scan ~1,500 messages.
When using MariaDB it took one minute to do the same.
With file based I had strange issues whatever lock type eI used 
(flock yes/no):
"bayes: bayes db version 0 is not able to be used, aborting! at 
/usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 206."



Anyways, now using Redis which appears to be the fastest.

Thanks again!

/Christian



Am 16.04.2021 um 08:48 schrieb Christian Völker:

Hi,

So I will re-configure my installation to use MariaDB.

You should also consider the Redis backend.


Ok, had a look when using MariaDB and I monitored it for the last 
24hrs. My 10 vCPUs where used, no I/O waits. But CPU usage overall 
was according to "top" only at 25% as top showed 75% idle. I assume 
there is some locking in place limiting the CPU usage.


I configured it now to use Redis instead of MySQL and top tells me 
about 25% idle with 0% I/O waits when running 10 sa-learn in 
parallel. Increasing or decreasing the number of jobs does not 
significally change the idle percentage.


So using redis the CPU usage is higher compared to MySQL.

Thanks for ideas!

/Christian













Re: sa-learn using multiple CPUs?

2021-04-16 Thread Axb

To avoid suprises, remember to watch your memory usage.
Redis reads/writes the DB in memory and only dumps to disk for backup.

"redis-cli info" is of help


On 4/16/21 9:10 AM, Christian Völker wrote:

Sorry to annoy you. Another addition to my tests:

When using redis it took me around 15seconds to scan ~1,500 messages.
When using MariaDB it took one minute to do the same.
With file based I had strange issues whatever lock type eI used (flock 
yes/no):
"bayes: bayes db version 0 is not able to be used, aborting! at 
/usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 206."



Anyways, now using Redis which appears to be the fastest.

Thanks again!

/Christian



Am 16.04.2021 um 08:48 schrieb Christian Völker:

Hi,

So I will re-configure my installation to use MariaDB.

You should also consider the Redis backend.


Ok, had a look when using MariaDB and I monitored it for the last 
24hrs. My 10 vCPUs where used, no I/O waits. But CPU usage overall was 
according to "top" only at 25% as top showed 75% idle. I assume there 
is some locking in place limiting the CPU usage.


I configured it now to use Redis instead of MySQL and top tells me 
about 25% idle with 0% I/O waits when running 10 sa-learn in parallel. 
Increasing or decreasing the number of jobs does not significally 
change the idle percentage.


So using redis the CPU usage is higher compared to MySQL.

Thanks for ideas!

/Christian








Re: sa-learn using multiple CPUs?

2021-04-16 Thread Christian Völker

Sorry to annoy you. Another addition to my tests:

When using redis it took me around 15seconds to scan ~1,500 messages.
When using MariaDB it took one minute to do the same.
With file based I had strange issues whatever lock type eI used (flock 
yes/no):
"bayes: bayes db version 0 is not able to be used, aborting! at 
/usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 206."



Anyways, now using Redis which appears to be the fastest.

Thanks again!

/Christian



Am 16.04.2021 um 08:48 schrieb Christian Völker:

Hi,

So I will re-configure my installation to use MariaDB.

You should also consider the Redis backend.


Ok, had a look when using MariaDB and I monitored it for the last 
24hrs. My 10 vCPUs where used, no I/O waits. But CPU usage overall was 
according to "top" only at 25% as top showed 75% idle. I assume there 
is some locking in place limiting the CPU usage.


I configured it now to use Redis instead of MySQL and top tells me 
about 25% idle with 0% I/O waits when running 10 sa-learn in parallel. 
Increasing or decreasing the number of jobs does not significally 
change the idle percentage.


So using redis the CPU usage is higher compared to MySQL.

Thanks for ideas!

/Christian