Bayes in V4 compared to V3
Hi. I have SA 4.0.1 configured it, all is good, except for bayes. It IS working, it IS learning but when it classifies mail it is really not so decisive as it was in V3. I have: dbg: bayes: corpus size: nspam = 1190, nham = 12441 dbg: bayes: DB expiry: tokens in DB: 979401, Expiry max size: 150, Oldest atime: 1725361640, Newest atime: 1725888528, Last expire: 0, Current time: 1725888537 So I have enough spam/ham and really enough tokens... What I find weird is this: BAYES_50 and BAYES_40 have like 10.000 hits EACH which is ALOT BAYES_80 only 600 BAYES_95 even less: 341 BAYES_99: 284 BAYES_20 only 150 BAYES_60 only 87 I have no BAYES lower than 40 at all. I am training and also use autolearn. I have also transferred corpus trained on SA v3 where it worked correctly. Is Spamassassin v4 really so much more conservative or am I doing something wrong here? Also; One more thing... Some mails even dont have BAYES added in score list, confirmed on 2 installs 1.95 DATE_IN_FUTURE_06_12 Date: is 6 to 12 hours after Received: date 1.10 DCC_CHECK Detected as bulk mail by DCC (dcc-servers.net) 0.10 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.50 DKIM_VALID Message has at least one valid DKIM or DK signature -1.00 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -0.10 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -0.00 DMARC_PASS DMARC pass policy 0.25 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit 0.30 FREEMAIL_FROM Sender email is commonly abused enduser mail provider 0.00 HTML_MESSAGE HTML included in message -0.00 RCVD_IN_DNSWL_NONE Sender listed at https://www.dnswl.org/, no trust -0.00 SPF_HELO_PASS SPF: HELO matches SPF record 2.50 URIBL_DBL_PHISH Contains a Phishing URL listed in the Spamhaus DBL blocklist But a lot of mails have bayes scores. There is no errors in logs and all is working fine... I also tried to empty and clear bayes db and retrain it, same results... Am I doung somethi g wrong? Regards,Grega
Re: Bayes "corpus" - how old?
On 2024-01-31 at 08:16:13 UTC-0500 (Wed, 31 Jan 2024 14:16:13 +0100) Matus UHLAR - fantomas is rumored to have said: On 2024-01-30 at 12:08:18 UTC-0500 (Tue, 30 Jan 2024 18:08:18 +0100) Matus UHLAR - fantomas is rumored to have said: [...] autolearn may help if your DB is well maintained, although I have disabled nearly all rules with negative scores, like RCVD_IN_DNSWL_* RCVD_IN_IADB_* DKIMWL_WL_* RCVD_IN_MSPIKE_* RCVD_IN_VALIDITY_* USER_IN_DEF_* ALL_TRUSTED etc, because spammers often abuse these. I mean, they may have negative score but don't train on them. On 30.01.24 15:31, Bill Cole wrote: If spammers can 'abuse' ALL_TRUSTED you have a major problem. Either a serious misconfiguration or compromised machines in trusted_networks. Can't ALL_TRUSTED happen if spammer delivers mail directly to my network, or, if last mail server removes Received: headers? I think this happened to me in the past but I may be wrong I just did a manual test on my personal machine to confirm: mail entered manually in a connection to port 25 from an unprivileged network with no Received headers did NOT get an ALL_TRUSTED match. The semantics around the word 'trusted' in SA are subtle and arcane. There's an important distinction between trusting that a particular MTA writes transparent and honest Received headers and trusting that a particular MTA does not relay spam. For example, I have 2 address blocks in my trusted_networks that are used by the ASF for forwarding, which I needed precisely because those machines sometimes forward spam and I need SA to look beyond the immediate clients, which I know tell me the truth about where they get the spam they offer me. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
Re: Bayes "corpus" - how old?
On 2024-01-30 at 12:08:18 UTC-0500 (Tue, 30 Jan 2024 18:08:18 +0100) Matus UHLAR - fantomas is rumored to have said: [...] autolearn may help if your DB is well maintained, although I have disabled nearly all rules with negative scores, like RCVD_IN_DNSWL_* RCVD_IN_IADB_* DKIMWL_WL_* RCVD_IN_MSPIKE_* RCVD_IN_VALIDITY_* USER_IN_DEF_* ALL_TRUSTED etc, because spammers often abuse these. I mean, they may have negative score but don't train on them. On 30.01.24 15:31, Bill Cole wrote: If spammers can 'abuse' ALL_TRUSTED you have a major problem. Either a serious misconfiguration or compromised machines in trusted_networks. Can't ALL_TRUSTED happen if spammer delivers mail directly to my network, or, if last mail server removes Received: headers? I think this happened to me in the past but I may be wrong -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. LSD will make your ECS screen display 16.7 million colors
Re: Bayes "corpus" - how old?
On 2024-01-30 at 12:08:18 UTC-0500 (Tue, 30 Jan 2024 18:08:18 +0100) Matus UHLAR - fantomas is rumored to have said: [...] autolearn may help if your DB is well maintained, although I have disabled nearly all rules with negative scores, like RCVD_IN_DNSWL_* RCVD_IN_IADB_* DKIMWL_WL_* RCVD_IN_MSPIKE_* RCVD_IN_VALIDITY_* USER_IN_DEF_* ALL_TRUSTED etc, because spammers often abuse these. I mean, they may have negative score but don't train on them. If spammers can 'abuse' ALL_TRUSTED you have a major problem. Either a serious misconfiguration or compromised machines in trusted_networks. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
Re: Bayes "corpus" - how old?
On 30.01.24 09:59, joe a wrote: Advisable to "prune" Bayes data based on age? While cleaning up recent Ham/Spam, found my "saved SPAM" goes back to 2013. Why that's over . . . wait, I need to take off my socks . . . So, how old is "too old". For saved SPAM? On 1/30/2024 10:58:52, Matus UHLAR - fantomas wrote: I did retrain on old spam a few times and it was working fine. Depends on how much mail you have: 0.000 0 7542 0 non-token data: nspam 0.000 0 80869 0 non-token data: nham 0.000 0 996032 0 non-token data: ntokens 0.000 0 1172945918 0 non-token data: oldest atime so, even old spam mey be fine. You however need much of ham to train otherwise everything starts looking like spam. On 30.01.24 11:12, joe a wrote: Recently missed spam has increased a bit, so I was dropping it into "missed spam" and went poking through marked spam and found lots of "missed ham".Which triggered my pondering. training on false-positives/false-negatives is important to have it up to date. full retraining only makes sense if you lose your DB, it gets corrupt or starts misclassifying too often (may the reason be known or not). autolearn may help if your DB is well maintained, although I have disabled nearly all rules with negative scores, like RCVD_IN_DNSWL_* RCVD_IN_IADB_* DKIMWL_WL_* RCVD_IN_MSPIKE_* RCVD_IN_VALIDITY_* USER_IN_DEF_* ALL_TRUSTED etc, because spammers often abuse these. I mean, they may have negative score but don't train on them. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. M$ Win's are shit, do not use it !
Re: Bayes "corpus" - how old?
On 1/30/2024 10:58:52, Matus UHLAR - fantomas wrote: On 30.01.24 09:59, joe a wrote: Advisable to "prune" Bayes data based on age? While cleaning up recent Ham/Spam, found my "saved SPAM" goes back to 2013. Why that's over . . . wait, I need to take off my socks . . . So, how old is "too old". For saved SPAM? I did retrain on old spam a few times and it was working fine. Depends on how much mail you have: 0.000 0 7542 0 non-token data: nspam 0.000 0 80869 0 non-token data: nham 0.000 0 996032 0 non-token data: ntokens 0.000 0 1172945918 0 non-token data: oldest atime so, even old spam mey be fine. You however need much of ham to train otherwise everything starts looking like spam. Recently missed spam has increased a bit, so I was dropping it into "missed spam" and went poking through marked spam and found lots of "missed ham".Which triggered my pondering.
Re: Bayes "corpus" - how old?
On 2024-01-30 at 09:59:52 UTC-0500 (Tue, 30 Jan 2024 09:59:52 -0500) joe a is rumored to have said: Advisable to "prune" Bayes data based on age? Yes. That is why it has an expiration model. Expiration may be de facto blocked on some busy systems so you may need to explicitly force it occasionally. The command "sa-learn --dump magic" will show you expiration and other Bayes metadata. While cleaning up recent Ham/Spam, found my "saved SPAM" goes back to 2013. Why that's over . . . wait, I need to take off my socks . . . I've still got some almost 3x as old. BUT: I do not use it for training SA today. So, how old is "too old". For saved SPAM? I would suggest a year as the outer edge of Bayes usefulness. I find it helpful to keep my decades of garbage because I use them (and my ham archive) in developing prospective rules. There are non-obvious fingerprints in some spam that imply decades-long spamming operations. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
Re: Bayes "corpus" - how old?
On 30.01.24 09:59, joe a wrote: Advisable to "prune" Bayes data based on age? While cleaning up recent Ham/Spam, found my "saved SPAM" goes back to 2013. Why that's over . . . wait, I need to take off my socks . . . So, how old is "too old". For saved SPAM? I did retrain on old spam a few times and it was working fine. Depends on how much mail you have: 0.000 0 7542 0 non-token data: nspam 0.000 0 80869 0 non-token data: nham 0.000 0 996032 0 non-token data: ntokens 0.000 0 1172945918 0 non-token data: oldest atime so, even old spam mey be fine. You however need much of ham to train otherwise everything starts looking like spam. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Linux - It's now safe to turn on your computer. Linux - Teraz mozete pocitac bez obav zapnut.
Bayes "corpus" - how old?
Advisable to "prune" Bayes data based on age? While cleaning up recent Ham/Spam, found my "saved SPAM" goes back to 2013. Why that's over . . . wait, I need to take off my socks . . . So, how old is "too old". For saved SPAM?
Re: Bayes Stopword
This is what I believe: the words need to be trimmed or separated, and careful consideration is required to determine the language in order to perform accurate cutoffs. Jimmy On Fri, Dec 29, 2023 at 5:16 PM wrote: > "ทุก" is not considered a word because it's part of the token > "ทุกวันพุธเล่นชนะรับเพิ่ม". > Words must be separated by spaces, otherwise we should skip the word > "theme" just because "the" is in english stopword list. > No idea if this makes sense for asian languages. > > Giovanni > > On 12/29/23 11:04, Jimmy wrote: > > > > The sample email and word list should contain at least these words. > > > > ถูก > > เลย > > ทุก > > > > Jimmy > > > > On Fri, Dec 29, 2023 at 4:47 PM giova...@paclan.it>> wrote: > > > > I do not speak Thai but I cannot see any word in the sample email > that should match that list. > > Which word do you think should match the regexp ? > >Giovanni > > > > On 12/29/23 10:08, Jimmy wrote: > > > You can use this word list > > > > > > > https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt > < > https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt> > < > https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt > < > https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt > >> > > > > > > Jimmy > > > > > > On Fri, Dec 29, 2023 at 3:59 PM giova...@paclan.it> <mailto:giova...@paclan.it <mailto:giova...@paclan.it>>> > wrote: > > > > > > To create the stopwords regexp I used the script I shared in > a previous email and a list of words one per line. > > > Could you share the list you are using ? > > > > > > Giovanni > > > > > > On 12/29/23 09:22, Jimmy wrote: > > > > I use SpamAssassin 4.0.0 (2022-12-14) > > > > > > > > $ spamassassin -D --lint 2>&1 | grep bayes: > > > > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found > lang=en > > > > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found > lang=th > > > > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found > lang=ru > > > > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found > lang=fr > > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found > lang=ja > > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found > lang=zh > > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found > lang=dk > > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found > lang=nl > > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found > lang=de > > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found > lang=es > > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found > lang=fi > > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found > lang=fr > > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found > lang=it > > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found > lang=no > > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found > lang=ru > > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found > lang=se > > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found > lang=tr > > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found > lang=vi > > > > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found > lang=ko > > > > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found > lang=zh > > > > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found > lang=hi > > > > Dec 29 15:17:58.019 [17420] dbg: bayes: stopwords for > languages enabled: en th ru fr ja zh dk nl de es fi fr it no ru se tr vi ko > zh hi > > > > > > > > > > > > $ spamassassin -D bayes,learn < test.msg 2>&1 | grep > "skipped token" > > > > Dec 29 15:16:57.585 [17347] dbg: bayes: skipped token > 'Email' because it's in stopword lis
Re: Bayes Stopword
"ทุก" is not considered a word because it's part of the token "ทุกวันพุธเล่นชนะรับเพิ่ม". Words must be separated by spaces, otherwise we should skip the word "theme" just because "the" is in english stopword list. No idea if this makes sense for asian languages. Giovanni On 12/29/23 11:04, Jimmy wrote: The sample email and word list should contain at least these words. ถูก เลย ทุก Jimmy On Fri, Dec 29, 2023 at 4:47 PM mailto:giova...@paclan.it>> wrote: I do not speak Thai but I cannot see any word in the sample email that should match that list. Which word do you think should match the regexp ? Giovanni On 12/29/23 10:08, Jimmy wrote: > You can use this word list > > https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt <https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt> <https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt <https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt>> > > Jimmy > > On Fri, Dec 29, 2023 at 3:59 PM mailto:giova...@paclan.it> <mailto:giova...@paclan.it <mailto:giova...@paclan.it>>> wrote: > > To create the stopwords regexp I used the script I shared in a previous email and a list of words one per line. > Could you share the list you are using ? > > Giovanni > > On 12/29/23 09:22, Jimmy wrote: > > I use SpamAssassin 4.0.0 (2022-12-14) > > > > $ spamassassin -D --lint 2>&1 | grep bayes: > > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=en > > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=th > > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=ru > > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=fr > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ja > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=zh > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=dk > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=nl > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=de > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=es > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fi > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fr > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=it > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=no > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ru > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=se > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=tr > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=vi > > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=ko > > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=zh > > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=hi > > Dec 29 15:17:58.019 [17420] dbg: bayes: stopwords for languages enabled: en th ru fr ja zh dk nl de es fi fr it no ru se tr vi ko zh hi > > > > > > $ spamassassin -D bayes,learn < test.msg 2>&1 | grep "skipped token" > > Dec 29 15:16:57.585 [17347] dbg: bayes: skipped token 'Email' because it's in stopword list for language 'en' > > > > You can use "บาท" that was listed in regexp pattern but somehow I don't know why it not show skipped token in bayes. > > > > Jimmy > > > > > > On Fri, Dec 29, 2023 at 2:59 PM mailto:giova...@paclan.it> <mailto:giova...@paclan.it <mailto:giova...@paclan.it>> <mailto:giova...@paclan.it <mailto:giova...@paclan.it> <mailto:giova...@paclan.it <mailto:giova...@paclan.it>>>> wrote: > > > > Config line produces a syntax error for me: > > config: failed to parse line in /etc/mail/spamassassin/local.cf <http://local.cf> <http://local.cf <http://local.cf>> <http://local.cf <http://local.cf> <http://local.cf <http://local.cf>>> (line 1): bayes_stopword_th > > > > Could you share the word list in utf8 ?
Re: Bayes Stopword
The sample email and word list should contain at least these words. ถูก เลย ทุก Jimmy On Fri, Dec 29, 2023 at 4:47 PM wrote: > I do not speak Thai but I cannot see any word in the sample email that > should match that list. > Which word do you think should match the regexp ? > Giovanni > > On 12/29/23 10:08, Jimmy wrote: > > You can use this word list > > > > > https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt > < > https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt > > > > > > Jimmy > > > > On Fri, Dec 29, 2023 at 3:59 PM giova...@paclan.it>> wrote: > > > > To create the stopwords regexp I used the script I shared in a > previous email and a list of words one per line. > > Could you share the list you are using ? > > > > Giovanni > > > > On 12/29/23 09:22, Jimmy wrote: > > > I use SpamAssassin 4.0.0 (2022-12-14) > > > > > > $ spamassassin -D --lint 2>&1 | grep bayes: > > > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=en > > > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=th > > > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=ru > > > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=fr > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ja > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=zh > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=dk > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=nl > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=de > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=es > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fi > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fr > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=it > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=no > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ru > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=se > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=tr > > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=vi > > > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=ko > > > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=zh > > > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=hi > > > Dec 29 15:17:58.019 [17420] dbg: bayes: stopwords for languages > enabled: en th ru fr ja zh dk nl de es fi fr it no ru se tr vi ko zh hi > > > > > > > > > $ spamassassin -D bayes,learn < test.msg 2>&1 | grep "skipped > token" > > > Dec 29 15:16:57.585 [17347] dbg: bayes: skipped token 'Email' > because it's in stopword list for language 'en' > > > > > > You can use "บาท" that was listed in regexp pattern but somehow I > don't know why it not show skipped token in bayes. > > > > > > Jimmy > > > > > > > > > On Fri, Dec 29, 2023 at 2:59 PM giova...@paclan.it> <mailto:giova...@paclan.it <mailto:giova...@paclan.it>>> > wrote: > > > > > > Config line produces a syntax error for me: > > > config: failed to parse line in /etc/mail/spamassassin/ > local.cf <http://local.cf> <http://local.cf <http://local.cf>> (line 1): > bayes_stopword_th > > > > > > Could you share the word list in utf8 ? > > > I tried adding "บาท" to > https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt > < > https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt> > < > https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt > < > https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt>> > and it produces a working regexp. > > > Bayes stopwords languages must also be enabled using > "bayes_stopword_languages" config keyword, by default only english is > enabled. > > >Giovanni > > > > > > On 12/28/23 17:06, Jimmy wrote: > > > > bayes_stopwor
Re: Bayes Stopword
I do not speak Thai but I cannot see any word in the sample email that should match that list. Which word do you think should match the regexp ? Giovanni On 12/29/23 10:08, Jimmy wrote: You can use this word list https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt <https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt> Jimmy On Fri, Dec 29, 2023 at 3:59 PM mailto:giova...@paclan.it>> wrote: To create the stopwords regexp I used the script I shared in a previous email and a list of words one per line. Could you share the list you are using ? Giovanni On 12/29/23 09:22, Jimmy wrote: > I use SpamAssassin 4.0.0 (2022-12-14) > > $ spamassassin -D --lint 2>&1 | grep bayes: > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=en > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=th > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=ru > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=fr > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ja > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=zh > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=dk > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=nl > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=de > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=es > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fi > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fr > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=it > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=no > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ru > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=se > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=tr > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=vi > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=ko > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=zh > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=hi > Dec 29 15:17:58.019 [17420] dbg: bayes: stopwords for languages enabled: en th ru fr ja zh dk nl de es fi fr it no ru se tr vi ko zh hi > > > $ spamassassin -D bayes,learn < test.msg 2>&1 | grep "skipped token" > Dec 29 15:16:57.585 [17347] dbg: bayes: skipped token 'Email' because it's in stopword list for language 'en' > > You can use "บาท" that was listed in regexp pattern but somehow I don't know why it not show skipped token in bayes. > > Jimmy > > > On Fri, Dec 29, 2023 at 2:59 PM mailto:giova...@paclan.it> <mailto:giova...@paclan.it <mailto:giova...@paclan.it>>> wrote: > > Config line produces a syntax error for me: > config: failed to parse line in /etc/mail/spamassassin/local.cf <http://local.cf> <http://local.cf <http://local.cf>> (line 1): bayes_stopword_th > > Could you share the word list in utf8 ? > I tried adding "บาท" to https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt <https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt> <https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt <https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt>> and it produces a working regexp. > Bayes stopwords languages must also be enabled using "bayes_stopword_languages" config keyword, by default only english is enabled. > Giovanni > > On 12/28/23 17:06, Jimmy wrote: > > bayes_stopword_th https://pastebin.pl/view/0838138d <https://pastebin.pl/view/0838138d> <https://pastebin.pl/view/0838138d <https://pastebin.pl/view/0838138d>> <https://pastebin.pl/view/0838138d <https://pastebin.pl/view/0838138d> <https://pastebin.pl/view/0838138d <https://pastebin.pl/view/0838138d>>> > > Sample mail https://pastebin.pl/view/e5a2c5b8 <https://pastebin.pl/view/e5a2c5b8> <https://pastebin.pl/view/e5a2c5b8 <https://pastebin.pl/view/e5a2c5b8>> <https://pastebin.pl/view/e5a2c5b8 <https://pastebin.pl/view/e5a2c5b8> <https://pastebin.pl/view/e5a2c5b8 <https://pastebin.pl/view/e5a2c5b8>>> > > > > Jimmy > > > > > > On Thu, Dec 28, 2023 at 10:59 PM mailto:gio
Re: Bayes Stopword
You can use this word list https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt Jimmy On Fri, Dec 29, 2023 at 3:59 PM wrote: > To create the stopwords regexp I used the script I shared in a previous > email and a list of words one per line. > Could you share the list you are using ? > >Giovanni > > On 12/29/23 09:22, Jimmy wrote: > > I use SpamAssassin 4.0.0 (2022-12-14) > > > > $ spamassassin -D --lint 2>&1 | grep bayes: > > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=en > > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=th > > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=ru > > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=fr > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ja > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=zh > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=dk > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=nl > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=de > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=es > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fi > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fr > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=it > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=no > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ru > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=se > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=tr > > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=vi > > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=ko > > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=zh > > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=hi > > Dec 29 15:17:58.019 [17420] dbg: bayes: stopwords for languages enabled: > en th ru fr ja zh dk nl de es fi fr it no ru se tr vi ko zh hi > > > > > > $ spamassassin -D bayes,learn < test.msg 2>&1 | grep "skipped token" > > Dec 29 15:16:57.585 [17347] dbg: bayes: skipped token 'Email' because > it's in stopword list for language 'en' > > > > You can use "บาท" that was listed in regexp pattern but somehow I don't > know why it not show skipped token in bayes. > > > > Jimmy > > > > > > On Fri, Dec 29, 2023 at 2:59 PM giova...@paclan.it>> wrote: > > > > Config line produces a syntax error for me: > > config: failed to parse line in /etc/mail/spamassassin/local.cf < > http://local.cf> (line 1): bayes_stopword_th > > > > Could you share the word list in utf8 ? > > I tried adding "บาท" to > https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt > < > https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt> > and it produces a working regexp. > > Bayes stopwords languages must also be enabled using > "bayes_stopword_languages" config keyword, by default only english is > enabled. > >Giovanni > > > > On 12/28/23 17:06, Jimmy wrote: > > > bayes_stopword_th https://pastebin.pl/view/0838138d < > https://pastebin.pl/view/0838138d> <https://pastebin.pl/view/0838138d < > https://pastebin.pl/view/0838138d>> > > > Sample mail https://pastebin.pl/view/e5a2c5b8 < > https://pastebin.pl/view/e5a2c5b8> <https://pastebin.pl/view/e5a2c5b8 < > https://pastebin.pl/view/e5a2c5b8>> > > > > > > Jimmy > > > > > > > > > On Thu, Dec 28, 2023 at 10:59 PM giova...@paclan.it> <mailto:giova...@paclan.it <mailto:giova...@paclan.it>>> > wrote: > > > > > > Could you share a config line and a sample you are using ? > > >Giovanni > > > > > > On 12/28/23 16:26, Jimmy wrote: > > > > Yes, I have done that, and I am also editing > Plugin/Bayes.pm to investigate why it is not being skipped. I suspect that > if words are not separated by spaces, longer words may not match those > patterns. > > > > > > > > Jimmy > > > > > > > > On Thu, Dec 28, 2023 at 10:13 PM <mailto:giova...@paclan.it> <mailto:giova...@paclan.it giova...@paclan.it>> <mailto:giova...@
Re: Bayes Stopword
To create the stopwords regexp I used the script I shared in a previous email and a list of words one per line. Could you share the list you are using ? Giovanni On 12/29/23 09:22, Jimmy wrote: I use SpamAssassin 4.0.0 (2022-12-14) $ spamassassin -D --lint 2>&1 | grep bayes: Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=en Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=th Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=ru Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=fr Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ja Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=zh Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=dk Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=nl Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=de Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=es Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fi Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fr Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=it Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=no Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ru Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=se Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=tr Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=vi Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=ko Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=zh Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=hi Dec 29 15:17:58.019 [17420] dbg: bayes: stopwords for languages enabled: en th ru fr ja zh dk nl de es fi fr it no ru se tr vi ko zh hi $ spamassassin -D bayes,learn < test.msg 2>&1 | grep "skipped token" Dec 29 15:16:57.585 [17347] dbg: bayes: skipped token 'Email' because it's in stopword list for language 'en' You can use "บาท" that was listed in regexp pattern but somehow I don't know why it not show skipped token in bayes. Jimmy On Fri, Dec 29, 2023 at 2:59 PM mailto:giova...@paclan.it>> wrote: Config line produces a syntax error for me: config: failed to parse line in /etc/mail/spamassassin/local.cf <http://local.cf> (line 1): bayes_stopword_th Could you share the word list in utf8 ? I tried adding "บาท" to https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt <https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt> and it produces a working regexp. Bayes stopwords languages must also be enabled using "bayes_stopword_languages" config keyword, by default only english is enabled. Giovanni On 12/28/23 17:06, Jimmy wrote: > bayes_stopword_th https://pastebin.pl/view/0838138d <https://pastebin.pl/view/0838138d> <https://pastebin.pl/view/0838138d <https://pastebin.pl/view/0838138d>> > Sample mail https://pastebin.pl/view/e5a2c5b8 <https://pastebin.pl/view/e5a2c5b8> <https://pastebin.pl/view/e5a2c5b8 <https://pastebin.pl/view/e5a2c5b8>> > > Jimmy > > > On Thu, Dec 28, 2023 at 10:59 PM mailto:giova...@paclan.it> <mailto:giova...@paclan.it <mailto:giova...@paclan.it>>> wrote: > > Could you share a config line and a sample you are using ? > Giovanni > > On 12/28/23 16:26, Jimmy wrote: > > Yes, I have done that, and I am also editing Plugin/Bayes.pm to investigate why it is not being skipped. I suspect that if words are not separated by spaces, longer words may not match those patterns. > > > > Jimmy > > > > On Thu, Dec 28, 2023 at 10:13 PM mailto:giova...@paclan.it> <mailto:giova...@paclan.it <mailto:giova...@paclan.it>> <mailto:giova...@paclan.it <mailto:giova...@paclan.it> <mailto:giova...@paclan.it <mailto:giova...@paclan.it>>>> wrote: > > > > "spamassassin -D bayes" will tell you, you should see a line like: > > bayes: skipped token 'from' because it's in stopword list for language 'en' > > > > Giovanni > > > > On 12/28/23 15:45, Jimmy wrote: > > > The pattern has successfully passed the test script, but it needs to check whether Bayes learning will identify and possibly exclude the word from matching this pattern. > > > > > > Thank you. > > > > > > > > > On Thu, Dec 28, 2023 at 9:22 PM mailto:giova...@paclan.it>
Re: Bayes Stopword
I use SpamAssassin 4.0.0 (2022-12-14) $ spamassassin -D --lint 2>&1 | grep bayes: Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=en Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=th Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=ru Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=fr Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ja Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=zh Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=dk Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=nl Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=de Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=es Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fi Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fr Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=it Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=no Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ru Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=se Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=tr Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=vi Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=ko Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=zh Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=hi Dec 29 15:17:58.019 [17420] dbg: bayes: stopwords for languages enabled: en th ru fr ja zh dk nl de es fi fr it no ru se tr vi ko zh hi $ spamassassin -D bayes,learn < test.msg 2>&1 | grep "skipped token" Dec 29 15:16:57.585 [17347] dbg: bayes: skipped token 'Email' because it's in stopword list for language 'en' You can use "บาท" that was listed in regexp pattern but somehow I don't know why it not show skipped token in bayes. Jimmy On Fri, Dec 29, 2023 at 2:59 PM wrote: > Config line produces a syntax error for me: > config: failed to parse line in /etc/mail/spamassassin/local.cf (line 1): > bayes_stopword_th > > Could you share the word list in utf8 ? > I tried adding "บาท" to > https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt > and it produces a working regexp. > Bayes stopwords languages must also be enabled using > "bayes_stopword_languages" config keyword, by default only english is > enabled. > Giovanni > > On 12/28/23 17:06, Jimmy wrote: > > bayes_stopword_th https://pastebin.pl/view/0838138d < > https://pastebin.pl/view/0838138d> > > Sample mail https://pastebin.pl/view/e5a2c5b8 < > https://pastebin.pl/view/e5a2c5b8> > > > > Jimmy > > > > > > On Thu, Dec 28, 2023 at 10:59 PM giova...@paclan.it>> wrote: > > > > Could you share a config line and a sample you are using ? > >Giovanni > > > > On 12/28/23 16:26, Jimmy wrote: > > > Yes, I have done that, and I am also editing Plugin/Bayes.pm to > investigate why it is not being skipped. I suspect that if words are not > separated by spaces, longer words may not match those patterns. > > > > > > Jimmy > > > > > > On Thu, Dec 28, 2023 at 10:13 PM giova...@paclan.it> <mailto:giova...@paclan.it <mailto:giova...@paclan.it>>> > wrote: > > > > > > "spamassassin -D bayes" will tell you, you should see a line > like: > > > bayes: skipped token 'from' because it's in stopword list for > language 'en' > > > > > >Giovanni > > > > > > On 12/28/23 15:45, Jimmy wrote: > > > > The pattern has successfully passed the test script, but > it needs to check whether Bayes learning will identify and possibly exclude > the word from matching this pattern. > > > > > > > > Thank you. > > > > > > > > > > > > On Thu, Dec 28, 2023 at 9:22 PM <mailto:giova...@paclan.it> <mailto:giova...@paclan.it giova...@paclan.it>> <mailto:giova...@paclan.it <mailto:giova...@paclan.it> > <mailto:giova...@paclan.it <mailto:giova...@paclan.it>>>> wrote: > > > > > > > > On 12/28/23 12:59, Jimmy wrote: > > > > > Hi, > > > > > > > > > > I'm seeking assistance in incorporating a stopword > for Asian languages in Unicode. Although I possess comprehensive word > lists, my attempts to generate a regex pattern and test it have been > unsuccessful; the pattern fai
Re: Bayes Stopword
Config line produces a syntax error for me: config: failed to parse line in /etc/mail/spamassassin/local.cf (line 1): bayes_stopword_th Could you share the word list in utf8 ? I tried adding "บาท" to https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt and it produces a working regexp. Bayes stopwords languages must also be enabled using "bayes_stopword_languages" config keyword, by default only english is enabled. Giovanni On 12/28/23 17:06, Jimmy wrote: bayes_stopword_th https://pastebin.pl/view/0838138d <https://pastebin.pl/view/0838138d> Sample mail https://pastebin.pl/view/e5a2c5b8 <https://pastebin.pl/view/e5a2c5b8> Jimmy On Thu, Dec 28, 2023 at 10:59 PM mailto:giova...@paclan.it>> wrote: Could you share a config line and a sample you are using ? Giovanni On 12/28/23 16:26, Jimmy wrote: > Yes, I have done that, and I am also editing Plugin/Bayes.pm to investigate why it is not being skipped. I suspect that if words are not separated by spaces, longer words may not match those patterns. > > Jimmy > > On Thu, Dec 28, 2023 at 10:13 PM mailto:giova...@paclan.it> <mailto:giova...@paclan.it <mailto:giova...@paclan.it>>> wrote: > > "spamassassin -D bayes" will tell you, you should see a line like: > bayes: skipped token 'from' because it's in stopword list for language 'en' > > Giovanni > > On 12/28/23 15:45, Jimmy wrote: > > The pattern has successfully passed the test script, but it needs to check whether Bayes learning will identify and possibly exclude the word from matching this pattern. > > > > Thank you. > > > > > > On Thu, Dec 28, 2023 at 9:22 PM mailto:giova...@paclan.it> <mailto:giova...@paclan.it <mailto:giova...@paclan.it>> <mailto:giova...@paclan.it <mailto:giova...@paclan.it> <mailto:giova...@paclan.it <mailto:giova...@paclan.it>>>> wrote: > > > > On 12/28/23 12:59, Jimmy wrote: > > > Hi, > > > > > > I'm seeking assistance in incorporating a stopword for Asian languages in Unicode. Although I possess comprehensive word lists, my attempts to generate a regex pattern and test it have been unsuccessful; the pattern fails to match or skips tokens in the newly added stopword list. > > > > > > I created the regex pattern using the following code: > > > > > > Regexp::Assemble->new->add(@words)->reduce(0)->as_string > > > > > > Afterward, I converted it to UTF-8 hex. > > > > > > I'm wondering if there are any tools available to facilitate the creation of these regex patterns. > > > > > I have used Regexp::Trie to create Bayes stopwords in the past, code is similar to: > > --- > > use strict; > > use warnings; > > > > use Encode; > > use Regexp::Trie; > > > > my @input = ; > > my $rt = Regexp::Trie->new; > > for my $w ( @input ) { > > chomp($w); > > $rt->add($w); > > } > > my $regexp = $rt->regexp; > > my @reg = split //, $regexp; > > for my $c ( @reg ) { > > my $char = $c; > > my $test; > > eval "\$test = decode( 'utf8', \$c, Encode::FB_CROAK )"; > > if( $@ ) { > > print 'x' . sprintf("%x", ord($c)); > > } else { > > print $char; > > } > > } > > --- > > > > Giovanni > > > OpenPGP_signature.asc Description: OpenPGP digital signature
Re: Bayes Stopword
bayes_stopword_th https://pastebin.pl/view/0838138d Sample mail https://pastebin.pl/view/e5a2c5b8 Jimmy On Thu, Dec 28, 2023 at 10:59 PM wrote: > Could you share a config line and a sample you are using ? > Giovanni > > On 12/28/23 16:26, Jimmy wrote: > > Yes, I have done that, and I am also editing Plugin/Bayes.pm to > investigate why it is not being skipped. I suspect that if words are not > separated by spaces, longer words may not match those patterns. > > > > Jimmy > > > > On Thu, Dec 28, 2023 at 10:13 PM giova...@paclan.it>> wrote: > > > > "spamassassin -D bayes" will tell you, you should see a line like: > > bayes: skipped token 'from' because it's in stopword list for > language 'en' > > > >Giovanni > > > > On 12/28/23 15:45, Jimmy wrote: > > > The pattern has successfully passed the test script, but it needs > to check whether Bayes learning will identify and possibly exclude the word > from matching this pattern. > > > > > > Thank you. > > > > > > > > > On Thu, Dec 28, 2023 at 9:22 PM giova...@paclan.it> <mailto:giova...@paclan.it <mailto:giova...@paclan.it>>> > wrote: > > > > > > On 12/28/23 12:59, Jimmy wrote: > > > > Hi, > > > > > > > > I'm seeking assistance in incorporating a stopword for > Asian languages in Unicode. Although I possess comprehensive word lists, my > attempts to generate a regex pattern and test it have been unsuccessful; > the pattern fails to match or skips tokens in the newly added stopword list. > > > > > > > > I created the regex pattern using the following code: > > > > > > > > Regexp::Assemble->new->add(@words)->reduce(0)->as_string > > > > > > > > Afterward, I converted it to UTF-8 hex. > > > > > > > > I'm wondering if there are any tools available to > facilitate the creation of these regex patterns. > > > > > > > I have used Regexp::Trie to create Bayes stopwords in the > past, code is similar to: > > > > > --- > > > use strict; > > > use warnings; > > > > > > use Encode; > > > use Regexp::Trie; > > > > > > my @input = ; > > > my $rt = Regexp::Trie->new; > > > for my $w ( @input ) { > > > chomp($w); > > > $rt->add($w); > > > } > > > my $regexp = $rt->regexp; > > > my @reg = split //, $regexp; > > > for my $c ( @reg ) { > > > my $char = $c; > > > my $test; > > > eval "\$test = decode( 'utf8', \$c, Encode::FB_CROAK )"; > > > if( $@ ) { > > > print 'x' . sprintf("%x", ord($c)); > > > } else { > > > print $char; > > > } > > > } > > > > > --- > > > > > >Giovanni > > > > > > >
Re: Bayes Stopword
Could you share a config line and a sample you are using ? Giovanni On 12/28/23 16:26, Jimmy wrote: Yes, I have done that, and I am also editing Plugin/Bayes.pm to investigate why it is not being skipped. I suspect that if words are not separated by spaces, longer words may not match those patterns. Jimmy On Thu, Dec 28, 2023 at 10:13 PM mailto:giova...@paclan.it>> wrote: "spamassassin -D bayes" will tell you, you should see a line like: bayes: skipped token 'from' because it's in stopword list for language 'en' Giovanni On 12/28/23 15:45, Jimmy wrote: > The pattern has successfully passed the test script, but it needs to check whether Bayes learning will identify and possibly exclude the word from matching this pattern. > > Thank you. > > > On Thu, Dec 28, 2023 at 9:22 PM mailto:giova...@paclan.it> <mailto:giova...@paclan.it <mailto:giova...@paclan.it>>> wrote: > > On 12/28/23 12:59, Jimmy wrote: > > Hi, > > > > I'm seeking assistance in incorporating a stopword for Asian languages in Unicode. Although I possess comprehensive word lists, my attempts to generate a regex pattern and test it have been unsuccessful; the pattern fails to match or skips tokens in the newly added stopword list. > > > > I created the regex pattern using the following code: > > > > Regexp::Assemble->new->add(@words)->reduce(0)->as_string > > > > Afterward, I converted it to UTF-8 hex. > > > > I'm wondering if there are any tools available to facilitate the creation of these regex patterns. > > > I have used Regexp::Trie to create Bayes stopwords in the past, code is similar to: > --- > use strict; > use warnings; > > use Encode; > use Regexp::Trie; > > my @input = ; > my $rt = Regexp::Trie->new; > for my $w ( @input ) { > chomp($w); > $rt->add($w); > } > my $regexp = $rt->regexp; > my @reg = split //, $regexp; > for my $c ( @reg ) { > my $char = $c; > my $test; > eval "\$test = decode( 'utf8', \$c, Encode::FB_CROAK )"; > if( $@ ) { > print 'x' . sprintf("%x", ord($c)); > } else { > print $char; > } > } > --- > > Giovanni > OpenPGP_signature.asc Description: OpenPGP digital signature
Re: Bayes Stopword
Yes, I have done that, and I am also editing Plugin/Bayes.pm to investigate why it is not being skipped. I suspect that if words are not separated by spaces, longer words may not match those patterns. Jimmy On Thu, Dec 28, 2023 at 10:13 PM wrote: > "spamassassin -D bayes" will tell you, you should see a line like: > bayes: skipped token 'from' because it's in stopword list for language 'en' > > Giovanni > > On 12/28/23 15:45, Jimmy wrote: > > The pattern has successfully passed the test script, but it needs to > check whether Bayes learning will identify and possibly exclude the word > from matching this pattern. > > > > Thank you. > > > > > > On Thu, Dec 28, 2023 at 9:22 PM giova...@paclan.it>> wrote: > > > > On 12/28/23 12:59, Jimmy wrote: > > > Hi, > > > > > > I'm seeking assistance in incorporating a stopword for Asian > languages in Unicode. Although I possess comprehensive word lists, my > attempts to generate a regex pattern and test it have been unsuccessful; > the pattern fails to match or skips tokens in the newly added stopword list. > > > > > > I created the regex pattern using the following code: > > > > > > Regexp::Assemble->new->add(@words)->reduce(0)->as_string > > > > > > Afterward, I converted it to UTF-8 hex. > > > > > > I'm wondering if there are any tools available to facilitate the > creation of these regex patterns. > > > > > I have used Regexp::Trie to create Bayes stopwords in the past, code > is similar to: > > > > --- > > use strict; > > use warnings; > > > > use Encode; > > use Regexp::Trie; > > > > my @input = ; > > my $rt = Regexp::Trie->new; > > for my $w ( @input ) { > > chomp($w); > > $rt->add($w); > > } > > my $regexp = $rt->regexp; > > my @reg = split //, $regexp; > > for my $c ( @reg ) { > > my $char = $c; > > my $test; > > eval "\$test = decode( 'utf8', \$c, Encode::FB_CROAK )"; > > if( $@ ) { > > print 'x' . sprintf("%x", ord($c)); > > } else { > > print $char; > > } > > } > > > > --- > > > >Giovanni > > > >
Re: Bayes Stopword
"spamassassin -D bayes" will tell you, you should see a line like: bayes: skipped token 'from' because it's in stopword list for language 'en' Giovanni On 12/28/23 15:45, Jimmy wrote: The pattern has successfully passed the test script, but it needs to check whether Bayes learning will identify and possibly exclude the word from matching this pattern. Thank you. On Thu, Dec 28, 2023 at 9:22 PM mailto:giova...@paclan.it>> wrote: On 12/28/23 12:59, Jimmy wrote: > Hi, > > I'm seeking assistance in incorporating a stopword for Asian languages in Unicode. Although I possess comprehensive word lists, my attempts to generate a regex pattern and test it have been unsuccessful; the pattern fails to match or skips tokens in the newly added stopword list. > > I created the regex pattern using the following code: > > Regexp::Assemble->new->add(@words)->reduce(0)->as_string > > Afterward, I converted it to UTF-8 hex. > > I'm wondering if there are any tools available to facilitate the creation of these regex patterns. > I have used Regexp::Trie to create Bayes stopwords in the past, code is similar to: --- use strict; use warnings; use Encode; use Regexp::Trie; my @input = ; my $rt = Regexp::Trie->new; for my $w ( @input ) { chomp($w); $rt->add($w); } my $regexp = $rt->regexp; my @reg = split //, $regexp; for my $c ( @reg ) { my $char = $c; my $test; eval "\$test = decode( 'utf8', \$c, Encode::FB_CROAK )"; if( $@ ) { print 'x' . sprintf("%x", ord($c)); } else { print $char; } } --- Giovanni OpenPGP_signature.asc Description: OpenPGP digital signature
Re: Bayes Stopword
The pattern has successfully passed the test script, but it needs to check whether Bayes learning will identify and possibly exclude the word from matching this pattern. Thank you. On Thu, Dec 28, 2023 at 9:22 PM wrote: > On 12/28/23 12:59, Jimmy wrote: > > Hi, > > > > I'm seeking assistance in incorporating a stopword for Asian languages > in Unicode. Although I possess comprehensive word lists, my attempts to > generate a regex pattern and test it have been unsuccessful; the pattern > fails to match or skips tokens in the newly added stopword list. > > > > I created the regex pattern using the following code: > > > > Regexp::Assemble->new->add(@words)->reduce(0)->as_string > > > > Afterward, I converted it to UTF-8 hex. > > > > I'm wondering if there are any tools available to facilitate the > creation of these regex patterns. > > > I have used Regexp::Trie to create Bayes stopwords in the past, code is > similar to: > > --- > use strict; > use warnings; > > use Encode; > use Regexp::Trie; > > my @input = ; > my $rt = Regexp::Trie->new; > for my $w ( @input ) { >chomp($w); >$rt->add($w); > } > my $regexp = $rt->regexp; > my @reg = split //, $regexp; > for my $c ( @reg ) { >my $char = $c; >my $test; >eval "\$test = decode( 'utf8', \$c, Encode::FB_CROAK )"; >if( $@ ) { > print 'x' . sprintf("%x", ord($c)); >} else { > print $char; >} > } > > --- > > Giovanni >
Re: Bayes Stopword
On 12/28/23 12:59, Jimmy wrote: Hi, I'm seeking assistance in incorporating a stopword for Asian languages in Unicode. Although I possess comprehensive word lists, my attempts to generate a regex pattern and test it have been unsuccessful; the pattern fails to match or skips tokens in the newly added stopword list. I created the regex pattern using the following code: Regexp::Assemble->new->add(@words)->reduce(0)->as_string Afterward, I converted it to UTF-8 hex. I'm wondering if there are any tools available to facilitate the creation of these regex patterns. I have used Regexp::Trie to create Bayes stopwords in the past, code is similar to: --- use strict; use warnings; use Encode; use Regexp::Trie; my @input = ; my $rt = Regexp::Trie->new; for my $w ( @input ) { chomp($w); $rt->add($w); } my $regexp = $rt->regexp; my @reg = split //, $regexp; for my $c ( @reg ) { my $char = $c; my $test; eval "\$test = decode( 'utf8', \$c, Encode::FB_CROAK )"; if( $@ ) { print 'x' . sprintf("%x", ord($c)); } else { print $char; } } --- Giovanni OpenPGP_signature.asc Description: OpenPGP digital signature
Bayes Stopword
Hi, I'm seeking assistance in incorporating a stopword for Asian languages in Unicode. Although I possess comprehensive word lists, my attempts to generate a regex pattern and test it have been unsuccessful; the pattern fails to match or skips tokens in the newly added stopword list. I created the regex pattern using the following code: Regexp::Assemble->new->add(@words)->reduce(0)->as_string Afterward, I converted it to UTF-8 hex. I'm wondering if there are any tools available to facilitate the creation of these regex patterns. Thank you, Jimmy
Re: Bayes always reject.
> From: Pierluigi Frullani > Date: Wed, 13 Dec 2023 07:49:24 +0100 > > Hello all, > I'm facing a strange problem. ... > tests=BAYES_95,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS,T_SCC_BODY_TEXT_LINE How did you feed this message into SpamAssassin? Did you do something to strip off all of the email headers? For the BAYES_99, as already mentioned you probably need to retrain bayes, making sure to correct any incorrectly trained email messages. -jeff
Re: Bayes always reject.
On 2023-12-13 at 01:49:24 UTC-0500 (Wed, 13 Dec 2023 07:49:24 +0100) Pierluigi Frullani is rumored to have said: Hello all, I'm facing a strange problem. Not really. MANY people run into this issue... I've feed the bayes db for a while and now I would like to put it in use but all messages get a BAYES_99 and very high spam point. I would like to understand why, and troubleshoot this problem but I can't find a way. The only reasons that can happen are: 1. All of your mail is in fact spam. 2. Your Bayes DB is mis-trained. The fix (assuming #2) is to recreate the Bayes DB with proper training. *IN THEORY* one could fix a corrupted DB by 'unlearning' messages which learned incorrectly, but as a practical matter that's usually a fantasy. Most of the scanning and DB details that you included are not useful. You cannot fix the bad DB, you need to rebuild it. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
Bayes always reject.
Hello all, I'm facing a strange problem. I've feed the bayes db for a while and now I would like to put it in use but all messages get a BAYES_99 and very high spam point. I would like to understand why, and troubleshoot this problem but I can't find a way. Spamassassin version is: root@puma:~# spamassassin --version SpamAssassin version 3.4.6 running on Perl version 5.22.2 This is the sa_learn --dump magic: root@puma:~# sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 130610 0 non-token data: nspam 0.000 0 316040 0 non-token data: nham 0.000 0 136493 0 non-token data: ntokens 0.000 0 1695915149 0 non-token data: oldest atime 0.000 0 1702447561 0 non-token data: newest atime 0.000 0 1702449197 0 non-token data: last journal sync atime 0.000 0 1701476495 0 non-token data: last expiry atime 0.000 05529600 0 non-token data: last expire atime delta 0.000 0 34998 0 non-token data: last expire reduction count and this is the spamassassin --lint -D: root@puma:~# spamassassin -D --lint 2>&1 | grep -i bay Dec 13 07:39:07.885 [26545] dbg: plugin: loading Mail::SpamAssassin::Plugin::Bayes from @INC Dec 13 07:39:08.005 [26545] dbg: config: fixed relative path: /var/lib/spamassassin/3.004006/updates_spamassassin_org/23_bayes.cf Dec 13 07:39:08.005 [26545] dbg: config: using "/var/lib/spamassassin/3.004006/updates_spamassassin_org/23_bayes.cf" for included file Dec 13 07:39:08.005 [26545] dbg: config: read file /var/lib/spamassassin/3.004006/updates_spamassassin_org/23_bayes.cf Dec 13 07:39:08.047 [26545] dbg: config: fixed relative path: /var/lib/spamassassin/3.004006/updates_spamassassin_org/ 60_bayes_stopwords.cf Dec 13 07:39:08.047 [26545] dbg: config: using "/var/lib/spamassassin/3.004006/updates_spamassassin_org/ 60_bayes_stopwords.cf" for included file Dec 13 07:39:08.047 [26545] dbg: config: read file /var/lib/spamassassin/3.004006/updates_spamassassin_org/ 60_bayes_stopwords.cf Dec 13 07:39:08.292 [26545] dbg: shortcircuit: adding BAYES_99 using abbreviation spam Dec 13 07:39:08.292 [26545] dbg: shortcircuit: adding BAYES_00 using abbreviation ham Dec 13 07:39:08.586 [26545] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x5cca570) implements 'learner_new', priority 0 Dec 13 07:39:08.586 [26545] dbg: bayes: learner_new self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x5cca570), bayes_store_module=Mail::SpamAssassin::BayesStore::DBM Dec 13 07:39:08.594 [26545] dbg: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x6a51bb0) Dec 13 07:39:08.594 [26545] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x5cca570) implements 'learner_is_scan_available', priority 0 Dec 13 07:39:08.595 [26545] dbg: bayes: tie-ing to DB file R/O /var/spamassasin/bayes_toks Dec 13 07:39:08.595 [26545] dbg: bayes: tie-ing to DB file R/O /var/spamassasin/bayes_seen Dec 13 07:39:08.595 [26545] dbg: bayes: found bayes db version 3 Dec 13 07:39:08.595 [26545] dbg: bayes: DB journal sync: last sync: 1702449197 Dec 13 07:39:08.621 [26545] dbg: bayes: DB journal sync: last sync: 1702449197 Dec 13 07:39:08.621 [26545] dbg: bayes: corpus size: nspam = 130610, nham = 316040 Dec 13 07:39:08.622 [26545] dbg: bayes: tokenized body: 120 tokens Dec 13 07:39:08.622 [26545] dbg: bayes: tokenized uri: 0 tokens Dec 13 07:39:08.622 [26545] dbg: bayes: tokenized invisible: 0 tokens Dec 13 07:39:08.623 [26545] dbg: bayes: tokenized header: 14 tokens Dec 13 07:39:08.623 [26545] dbg: bayes: score = 0.976034467829266 Dec 13 07:39:08.624 [26545] dbg: bayes: DB expiry: tokens in DB: 136493, Expiry max size: 15, Oldest atime: 1695915149, Newest atime: 1702447561, Last expire: 1701476495, Current time: 1702449548 Dec 13 07:39:08.624 [26545] dbg: bayes: DB journal sync: last sync: 1702449197 Dec 13 07:39:08.624 [26545] dbg: bayes: untie-ing Dec 13 07:39:08.624 [26545] dbg: check: tagrun - tag BAYESTCHAMMY is now ready, value: 0 Dec 13 07:39:08.624 [26545] dbg: check: tagrun - tag BAYESTCSPAMMY is now ready, value: 2 Dec 13 07:39:08.624 [26545] dbg: check: tagrun - tag BAYESTCLEARNED is now ready, value: 4 Dec 13 07:39:08.624 [26545] dbg: check: tagrun - tag BAYESTC is now ready, value: 20 Dec 13 07:39:08.628 [26545] dbg: rules: ran eval rule BAYES_95 ==> got hit (1) Dec 13 07:39:08.863 [26545] dbg: check: tests=BAYES_95,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS,T_SCC_BODY_TEXT_LINE Dec 13 07:39:08.864 [26545] dbg: timing: total 1004 ms - init: 738 (73.5%), parse: 0.85 (0.1%), extract_message_metadata: 1.10 (0.1%), get_uri_detail_list: 3.9 (0.4%), tests_pri_-2000: 4.3 (0.4%), compile_gen: 85 (8.5%), compile_eval: 13 (1.3%), tests_pri_-1000: 3.6 (0.4%), tests_pri_-950: 2.8 (0.3%), tests_pri_-900: 4.2 (0.4%), tests_
Re: Share bayes database between servers
On Sun, Jul 09, 2023 at 07:06:10PM +0200, Robert Senger wrote: > I've set up a testing environment that also uses master-master > replication of the mysql bayes database, with priority in dns set to > equal for both mx to get incoming mail distributed evenly to both > systems. So far, this seems to work, but this is a low load > environment. it boils down on how much you trust mysql master-master replication stability and performance, which is heavily dependent on your experiences and exact versions used (are we talking about Oracle Mysql, or MariaDB or Percona forks? which versions? What replication setup? etc.) I've had problems under high concurrent load (not performance, but replication setup breaking) in the past, so I prefer to avoid master-master replication if possible, especially if I anticipate high concurrent load. But if you are confident in it, sure, go ahead. > Any suggestions? Well, how are you training your bayes DB? If it is via cron and manually curated ham/spam corpuses (the recommended way), I'd rather suggest keeping databases separate and simply running training on both servers (you can duplicate or share ham/spam corpuses as you wish, from rsync to SMB/NFS). If you are using auto-learn (which was not recommended last time I looked), well, you'd probably better off NOT syncing bayes at all IMHO, as it should be prefered that risk of bayes poisoning is reduced to one server instead of replicating that (and there is not much benefit, as auto-learn will quickly learn on each server separately anyway, and if one set of domains is not getting some type of spam, it is not beneficial to learn it anyway) -- Opinions above are GNU-copylefted.
Re: Share bayes database between servers
Am Sonntag, dem 09.07.2023 um 19:21 +0200 schrieb Reindl Harald: > > > Am 09.07.23 um 19:06 schrieb Robert Senger: > > But bayes data may be updated by either the primary mx or the > > backup > > mx, since email may arrive at either server. > > in a smart setup your bayes-database is read-only like here since > 2014, > any autolearning disabled and strictly trained manually by a stored > corpus giving you the opportinity removed and add messages to the > training folders and revuild from scratch > > we share our bayes-db even with a different company since 2014 Well, that's the boring solution... ;) Nevertheless, this is what I will likely do if I encounter any problems with the mysql master-master replication as I have it running now. Robert -- Robert Senger
Share bayes database between servers
Hi there, I am running two mailservers, first one serving two domains, other one serving one domain. Both serve as backup mx for each other. Both know about users and aliases of the other domain(s). On both systems, spamassassin is configured to read/store userprefs and bayes data (per user) in a local mysql database. Both systems reject email if the score exceeds a certain limit. To avoid backscatter (or the need to accept any spam not rejected by the backup mx), both servers should do their spam filtering based on exactly the same information, including bayes data. Now, the question is, what is the best way to share bayes data between two (or more) servers? I already share userprefs by setting up master-master replication between the two mysql databases on both servers. This is uncritical, since users (or admins) will update only userprefs for the local virtual users on each system, which means, backup mx will never touch primary mx userprefs. But bayes data may be updated by either the primary mx or the backup mx, since email may arrive at either server. I've set up a testing environment that also uses master-master replication of the mysql bayes database, with priority in dns set to equal for both mx to get incoming mail distributed evenly to both systems. So far, this seems to work, but this is a low load environment. Any suggestions? Regards, Robert -- Robert Senger
Re: BAYES scores
joe a skrev den 2023-02-28 17:37: Curious as to why these scores, apparently "stock" are what they are. I'd expect BAYES_999 BODY to count more than BAYES_99 BODY. Noted in a header this morning: * 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% * [score: 1.] * 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% * [score: 1.] Was this discussed recently? I added a local score to mollify my sense of propriety. what does it solve for you ? maybe it could be changed to not overlap on scores, but what should scores change ?
Re: BAYES scores
From: "Bill Cole" It is my understanding that an automated rescoring job was run quite some time ago (before I was on the PMC) to generate the Bayes scores, which determined that to be the best supplemental score to give to the greater certainty. I was around in those days. My memory isn't the greatest anymore, but what I recall was that they did automatic rescoring, and then manually tweaked a few of the values, basically to make them look pretty by rounding off long fractions. BAYES_999 may have been scored almost completely manually, I can't quite recall. Loren
Re: BAYES scores
joe a skrev den 2023-02-28 17:37: Curious as to why these scores, apparently "stock" are what they are. I'd expect BAYES_999 BODY to count more than BAYES_99 BODY. Noted in a header this morning: * 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% * [score: 1.] * 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% * [score: 1.] Was this discussed recently? I added a local score to mollify my sense of propriety. what does it solve for you ? maybe it could be changed to not overlap on scores, but what should scores change ? tag can be splited so it is not overlapping hits, but what should scores so change to ?
Re: BAYES scores
On 2023-02-28 at 13:38:35 UTC-0500 (Tue, 28 Feb 2023 13:38:35 -0500) joe a is rumored to have said: On 2/28/2023 12:05 PM, Jeff Mincy wrote: > From: joe a > Date: Tue, 28 Feb 2023 11:37:34 -0500 > > Curious as to why these scores, apparently "stock" are what they are. > I'd expect BAYES_999 BODY to count more than BAYES_99 BODY. > > Noted in a header this morning: > > * 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% > * [score: 1.] > * 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% > * [score: 1.] > > Was this discussed recently? I added a local score to mollify my sense > of propriety. Those two rules overlap. A message with bayes >= 99.9% hits both rules. BAYES_99 ends at 1.00 not .999. -jeff I get that they overlap. I guess my thinker gets in a knot wondering why there is so little weight given to the more certain determination. It is my understanding that an automated rescoring job was run quite some time ago (before I was on the PMC) to generate the Bayes scores, which determined that to be the best supplemental score to give to the greater certainty. Bayes rules are not rescored routinely in the daily rescoring task because those hits are inherently different at every site. If you wish to determine the ideal scores for YOUR mix of ham and spam, I believe all the tools for doing so are in the SA code tree, but they may not be well-documented. That's likely to not be a satisfying answer, but as a volunteer project we have no funding for Customer Satisfaction, so the bare unsatisfying truth will have to do. In my narrow view, anything that is 99.9% certain is probably worth a 5 on it's own. Or, at least should when, summed with BAYES_99, equal 5. As that is what the default "SPAM flag" is. Appears more experienced or thoughtful persons think otherwise. I don't know that I'd go that far. Rescoring is not done based on simple clear reason, but on numbers. I'm not sure whether any currently active SA developers are able to explain exactly how the rescoring works. Yes, it did snow heavily overnight. Yes, I am looking for excuses not to visit that issue. I vehemently recommend reading all of Justin's scripts and documentation (I think it's all in the 'build' sub-directory) and figuring out how to rescore based on your own mail. That's MUCH less unpleasant than dealing with the snow. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
Re: BAYES scores
>From my small experience... I score BAYES_999 with 2.00, it was suggested to me months ago. But nowadays I'd be more careful and do some more testing: I'd check which messages have only BAYES_99 and which have BAYES_999, If you are absolutely certain that BYES_999 are only and definitively spam, go with 2 or more; if you have several false positives, keep the score low. I learnt the hard way that BAYES depends on the corpus used to grow the database. On Tue, Feb 28, 2023 at 7:39 PM joe a wrote: > On 2/28/2023 12:05 PM, Jeff Mincy wrote: > > > From: joe a > > > Date: Tue, 28 Feb 2023 11:37:34 -0500 > > > > > > Curious as to why these scores, apparently "stock" are what they are. > > > I'd expect BAYES_999 BODY to count more than BAYES_99 BODY. > > > > > > Noted in a header this morning: > > > > > > * 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% > > > * [score: 1.] > > > * 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% > > > * [score: 1.] > > > > > > Was this discussed recently? I added a local score to mollify my > sense > > > of propriety. > > > > Those two rules overlap. A message with bayes >= 99.9% hits both > > rules. BAYES_99 ends at 1.00 not .999. > > -jeff > > > > I get that they overlap. I guess my thinker gets in a knot wondering > why there is so little weight given to the more certain determination. > > In my narrow view, anything that is 99.9% certain is probably worth a 5 > on it's own. Or, at least should when, summed with BAYES_99, equal 5. > As that is what the default "SPAM flag" is. > > Appears more experienced or thoughtful persons think otherwise. > > Yes, it did snow heavily overnight. Yes, I am looking for excuses not > to visit that issue. >
Re: BAYES scores
On 2/28/2023 12:05 PM, Jeff Mincy wrote: > From: joe a > Date: Tue, 28 Feb 2023 11:37:34 -0500 > > Curious as to why these scores, apparently "stock" are what they are. > I'd expect BAYES_999 BODY to count more than BAYES_99 BODY. > > Noted in a header this morning: > > * 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% > * [score: 1.] > * 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% > * [score: 1.] > > Was this discussed recently? I added a local score to mollify my sense > of propriety. Those two rules overlap. A message with bayes >= 99.9% hits both rules. BAYES_99 ends at 1.00 not .999. -jeff I get that they overlap. I guess my thinker gets in a knot wondering why there is so little weight given to the more certain determination. In my narrow view, anything that is 99.9% certain is probably worth a 5 on it's own. Or, at least should when, summed with BAYES_99, equal 5. As that is what the default "SPAM flag" is. Appears more experienced or thoughtful persons think otherwise. Yes, it did snow heavily overnight. Yes, I am looking for excuses not to visit that issue.
Re: BAYES scores
> From: joe a > Date: Tue, 28 Feb 2023 11:37:34 -0500 > > Curious as to why these scores, apparently "stock" are what they are. > I'd expect BAYES_999 BODY to count more than BAYES_99 BODY. > > Noted in a header this morning: > > * 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% > * [score: 1.] > * 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% > * [score: 1.] > > Was this discussed recently? I added a local score to mollify my sense > of propriety. Those two rules overlap. A message with bayes >= 99.9% hits both rules. BAYES_99 ends at 1.00 not .999. -jeff
BAYES scores
Curious as to why these scores, apparently "stock" are what they are. I'd expect BAYES_999 BODY to count more than BAYES_99 BODY. Noted in a header this morning: * 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% * [score: 1.] * 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% * [score: 1.] Was this discussed recently? I added a local score to mollify my sense of propriety.
Re: Strange findings debugging bayes results
On Mon, Feb 20, 2023 at 01:30:15PM -0800, Loren Wilton wrote: > This is a home system with only a few users. All users have "Spam" and "Ham" > folders showing up in their email program of choice, and they just drag > messages they do or don't like into the appropriate folders. There are > "Oldham" > and "Oldspam" mboxes, and the new spam and ham (respectively) get merged into > these folders after learning, and removed from the current Spam and Ham > folders. I had a similar idea but never implemmented it because I felt it was too difficult for users to deal with. I was considering 2 folders: 'Spam Training Set' and 'Ham Training Set' which would always represent the set of messages that Spamassassin was currently trained with. If you changed the contents of these mboxes, a cron job would delete the old bayes tokens and retrain with the current set. The difference between these folders and the Spam folder (or Junk or whatever you call it locally) is that messages older than 30 days get auto-deleted. After 30 days, those messages would no longer represent the training set. Having 2 spam folders is confusing and not easy to manage. Neither of these 2 extra folders are folders that users would look for messages so they really do have to copy messages into them which isn't just dragging them. That for me was the main issue I faced. So I abandoned this line of thinkinking. You mentioned harvesting ham and spam from mboxes as in from the inbox directly. This got me wondering more about this. Clearly using messages that the user dragged to Spam that spamassassin did not mark as Spam to train as spam. Easy. And use messages that the user left in their mailbox or deleted or archived as ham. Could be ok but less sure. And lastly, messages that were in Spam (since Spamassassin marked them as spam), that a user moved out of Spam. Just look through all their folders (except Spam) for messages that Spamassassin marked as spam and retrain on those as ham. Again, maybe a bad assumption, could work though. I was really just curious to know if other people had workable ideas how to get bayes trained with the least amount of friction. signature.asc Description: PGP signature
Re: Strange findings debugging bayes results
From: "Reindl Harald" in other words a system for morons - morons which will drag mails to spam instead click on "unsubscribe" per-user bayes don't work well, never Well Harald, you are certainly welcome to your opinion. It would be nicer if you had kept it yourself though. The system works just fine with the userbase it has. It probably wouldn't work for AOL or *.online.
Re: Strange findings debugging bayes results
This is a home system with only a few users. All users have "Spam" and "Ham" folders showing up in their email program of choice, and they just drag messages they do or don't like into the appropriate folders. There are "Oldham" and "Oldspam" mboxes, and the new spam and ham (respectively) get merged into these folders after learning, and removed from the current Spam and Ham folders. - Original Message - From: Michael Grant To: users@spamassassin.apache.org ; Loren Wilton ; hg user Sent: Monday, February 20, 2023 12:47 PM Subject: Re: Strange findings debugging bayes results On 20 February 2023 12:28:00 CET, Loren Wilton wrote: > > A cron job that will harvest Spam and Ham mboxes and feed them to sa-learn once a day, then archive the learned messages. Per-user bayes and learning. Mail is hand-moved into the spam and ham learning folders, and for my personal account, I do this rarely, generally only when a message is mis-categorized. Although messages being mis-categorized as spam is often the result of a lot of quite aggressive local rules I have rather than a Bayes mis-classification. When you "harvest" ham from mboxes, what do you consider ham? You also, additionally, have a Ham folder for your users then? Interesting. Did you manage to train your users to use it easily? Does it grow unbounded or are old messages removed from it? If so, how to know they can be deleted like from the Spam folder. It's an interesting idea, just wondering about the details. Getting my users to train spamassassim has always been impossible for me.
Re: Strange findings debugging bayes results
On 20 February 2023 12:28:00 CET, Loren Wilton wrote: > > A cron job that will harvest Spam and Ham mboxes and feed them to sa-learn > once a day, then archive the learned messages. Per-user bayes and learning. > Mail is hand-moved into the spam and ham learning folders, and for my > personal account, I do this rarely, generally only when a message is > mis-categorized. Although messages being mis-categorized as spam is often the > result of a lot of quite aggressive local rules I have rather than a Bayes > mis-classification. When you "harvest" ham from mboxes, what do you consider ham? You also, additionally, have a Ham folder for your users then? Interesting. Did you manage to train your users to use it easily? Does it grow unbounded or are old messages removed from it? If so, how to know they can be deleted like from the Spam folder. It's an interesting idea, just wondering about the details. Getting my users to train spamassassim has always been impossible for me.
Re: Strange findings debugging bayes results
> Can you please give me some details on your bayes setup? > Headers exclusion, bayes_token_sources, how do you "sa-learn" messages... Standard options on Bayes. No autolearn. A cron job that will harvest Spam and Ham mboxes and feed them to sa-learn once a day, then archive the learned messages. Per-user bayes and learning. Mail is hand-moved into the spam and ham learning folders, and for my personal account, I do this rarely, generally only when a message is mis-categorized. Although messages being mis-categorized as spam is often the result of a lot of quite aggressive local rules I have rather than a Bayes mis-classification.
Re: Strange findings debugging bayes results
Can you please give me some details on your bayes setup? Headers exclusion, bayes_token_sources, how do you "sa-learn" messages... thank you On Sun, Feb 19, 2023 at 11:53 PM Loren Wilton wrote: > > The real question is: has bayes still its use case in 2023 ? Is it still > used with important scores or just to flag messages for a review? > > It works fine for me here. > >
Re: Strange findings debugging bayes results
> The real question is: has bayes still its use case in 2023 ? Is it still used > with important scores or just to flag messages for a review? It works fine for me here.
Re: Strange findings debugging bayes results
> > > bayes_token_sources none visible uri mimepart > I added this line to my config with no changes in the tokens used to sum the bayes score, headers still used. It may be a command only recognized during learning but I should check the sources. > perhaps OP has bayes_token_sources setting that takes only headers > into the account? > No. that mail had really few words in the text and probably the bayes system considered them not relevant. The real question is: has bayes still its use case in 2023 ? Is it still used with important scores or just to flag messages for a review?
Re: Strange findings debugging bayes results
On Thu, Feb 16, 2023 at 01:02:25PM +0200, Henrik K wrote: > On Thu, Feb 16, 2023 at 10:18:50AM +0100, hg user wrote: > > Every score is based on headers, very generic headers. and some > > related to my setup. > > > > Not a single token from the message body > > The Bayes implementation has been practically unmaintained for a long time, > so YMMV. > > You can try something like this, most headers are parsed badly and generate > biasing random garbage (unscientific observation): > > bayes_ignore_header ARC-Authentication-Results > bayes_ignore_header ARC-Message-Signature Yeah, bayes of headers (and CSS/HTML stuff) has been doing me much more misclassifications than good, so I've eventually given up on updating ever-growing bayes_ignore_header list and disabled bayes on the headers altogether: bayes_token_sources none visible uri mimepart My stance being: If enduser would not be classifying on those sources (except Subject header), neither should automatic bayes classification... perhaps OP has bayes_token_sources setting that takes only headers into the account? https://man.archlinux.org/man/Mail::SpamAssassin::Conf.3pm.en#bayes_token_sources -- Opinions above are GNU-copylefted.
Re: Strange findings debugging bayes results
I've updated 23_bayes_ignore_header.cf (last update was from 2016 :) https://svn.apache.org/repos/asf/spamassassin/trunk/rulesrc/sandbox/axb/23_bayes_ignore_header.cf Axb On 2/16/23 14:17, Dave Wreski wrote: Here's also another 50+ headers we've collected over the years that I believe started as a list from AXB 10+ years ago. https://pastebin.com/raw/f6Fwh8HJ
Re: Strange findings debugging bayes results
Hi, Here's also another 50+ headers we've collected over the years that I believe started as a list from AXB 10+ years ago. https://pastebin.com/raw/f6Fwh8HJ dave On 2/16/23 6:02 AM, Henrik K wrote: On Thu, Feb 16, 2023 at 10:18:50AM +0100, hg user wrote: I was investigating a bunch of bitcoin spam: different titles, different senders (all from gmail), different text, different pdf attachment. Unfortunately in those days my bayes db was polluted and they all got a BAYES_50, 0.8. I tested the messages now with a recreated bayes db and got some BAYES_999. So I dug to understand if I already saw the spam... But the debug result was unpleasant: dbg: bayes: tokenized header: 92 tokens dbg: bayes: token 'HX-Received:Jan' => 0.998028449502134 dbg: bayes: token 'HX-Google-DKIM-Signature:20210112' => 0.997244532803181 dbg: bayes: token 'H*r:sk:' => 0.997244532803181 dbg: bayes: token 'H*r:a05' => 0.995425742574258 dbg: bayes: token 'HAuthentication-Results:sk:.' => 0.986543689320388 dbg: bayes: token 'HX-Google-DKIM-Signature:reply-to' => 0.916110175863517 dbg: bayes: token 'H*r:2002' => 0.877842810325844 dbg: bayes: token 'HAuthentication-Results:2048-bit' => 0.858520043212023 dbg: bayes: token 'HAuthentication-Results:pass' => 0.855193895034317 dbg: bayes: score = 0.97915091326 Every score is based on headers, very generic headers. and some related to my setup. Not a single token from the message body The Bayes implementation has been practically unmaintained for a long time, so YMMV. You can try something like this, most headers are parsed badly and generate biasing random garbage (unscientific observation): bayes_ignore_header ARC-Authentication-Results bayes_ignore_header ARC-Message-Signature bayes_ignore_header ARC-Seal bayes_ignore_header Authentication-Results bayes_ignore_header Autocrypt bayes_ignore_header IronPort-SDR bayes_ignore_header suggested_attachment_session_id bayes_ignore_header X-Brightmail-Tracker bayes_ignore_header X-Exchange-Antispam-Report-CFA-Test bayes_ignore_header X-Forefront-Antispam-Report bayes_ignore_header X-Forefront-Antispam-Report-Untrusted bayes_ignore_header X-Gm-Message-State bayes_ignore_header X-Google-DKIM-Signature bayes_ignore_header x-microsoft-antispam bayes_ignore_header X-Microsoft-Antispam-Message-Info bayes_ignore_header X-Microsoft-Antispam-Message-Info-Original bayes_ignore_header X-Microsoft-Antispam-Untrusted bayes_ignore_header X-Microsoft-Exchange-Diagnostics bayes_ignore_header x-ms-exchange-antispam-messagedata bayes_ignore_header x-ms-exchange-antispam-messagedata-0 bayes_ignore_header x-ms-exchange-crosstenant-id bayes_ignore_header x-ms-exchange-crosstenant-network-message-id bayes_ignore_header x-ms-exchange-crosstenant-rms-persistedconsumerorg bayes_ignore_header X-MS-Exchange-CrossTenant-userprincipalname bayes_ignore_header x-ms-exchange-slblob-mailprops bayes_ignore_header x-ms-office365-filtering-correlation-id bayes_ignore_header X-MSFBL bayes_ignore_header X-Provags-ID bayes_ignore_header X-SG-EID bayes_ignore_header X-SG-ID bayes_ignore_header X-UI-Out-Filterresults bayes_ignore_header X-ClientProxiedBy bayes_ignore_header X-MS-Exchange-CrossTenant-FromEntityHeader bayes_ignore_header X-OriginatorOrg bayes_ignore_header X-MS-Exchange-CrossTenant-OriginalArrivalTime bayes_ignore_header X-MS-TrafficTypeDiagnostic bayes_ignore_header X-MS-Exchange-CrossTenant-AuthAs bayes_ignore_header X-MS-Exchange-Transport-CrossTenantHeadersStamped bayes_ignore_header X-MS-Exchange-CrossTenant-AuthSource -- DaveWreski President & CEO Guardian Digital, Inc. We Make Email Safe 640-800-9446 dwre...@guardiandigital.com <mailto:dwre...@guardiandigital.com> https://guardiandigital.com <https://guardiandigital.com> 103 Godwin Ave, Suite 314, Midland Park, NJ 07432 facebook <https://www.facebook.com/gdlinux> twitter <https://twitter.com/gdlinux> linkedin <https://www.linkedin.com/company/guardiandigital>
Re: Strange findings debugging bayes results
On Thu, Feb 16, 2023 at 10:18:50AM +0100, hg user wrote: > I was investigating a bunch of bitcoin spam: different titles, > different senders (all from gmail), different text, different pdf > attachment. > > Unfortunately in those days my bayes db was polluted and they all got > a BAYES_50, 0.8. > > I tested the messages now with a recreated bayes db and got some > BAYES_999. So I dug to understand if I already saw the spam... > > But the debug result was unpleasant: > dbg: bayes: tokenized header: 92 tokens > dbg: bayes: token 'HX-Received:Jan' => 0.998028449502134 > dbg: bayes: token 'HX-Google-DKIM-Signature:20210112' => 0.997244532803181 > dbg: bayes: token 'H*r:sk:' => > 0.997244532803181 > dbg: bayes: token 'H*r:a05' => 0.995425742574258 > dbg: bayes: token 'HAuthentication-Results:sk:.' => > 0.986543689320388 > dbg: bayes: token 'HX-Google-DKIM-Signature:reply-to' => 0.916110175863517 > dbg: bayes: token 'H*r:2002' => 0.877842810325844 > dbg: bayes: token 'HAuthentication-Results:2048-bit' => 0.858520043212023 > dbg: bayes: token 'HAuthentication-Results:pass' => 0.855193895034317 > dbg: bayes: score = 0.97915091326 > > > Every score is based on headers, very generic headers. and some > related to my setup. > > Not a single token from the message body The Bayes implementation has been practically unmaintained for a long time, so YMMV. You can try something like this, most headers are parsed badly and generate biasing random garbage (unscientific observation): bayes_ignore_header ARC-Authentication-Results bayes_ignore_header ARC-Message-Signature bayes_ignore_header ARC-Seal bayes_ignore_header Authentication-Results bayes_ignore_header Autocrypt bayes_ignore_header IronPort-SDR bayes_ignore_header suggested_attachment_session_id bayes_ignore_header X-Brightmail-Tracker bayes_ignore_header X-Exchange-Antispam-Report-CFA-Test bayes_ignore_header X-Forefront-Antispam-Report bayes_ignore_header X-Forefront-Antispam-Report-Untrusted bayes_ignore_header X-Gm-Message-State bayes_ignore_header X-Google-DKIM-Signature bayes_ignore_header x-microsoft-antispam bayes_ignore_header X-Microsoft-Antispam-Message-Info bayes_ignore_header X-Microsoft-Antispam-Message-Info-Original bayes_ignore_header X-Microsoft-Antispam-Untrusted bayes_ignore_header X-Microsoft-Exchange-Diagnostics bayes_ignore_header x-ms-exchange-antispam-messagedata bayes_ignore_header x-ms-exchange-antispam-messagedata-0 bayes_ignore_header x-ms-exchange-crosstenant-id bayes_ignore_header x-ms-exchange-crosstenant-network-message-id bayes_ignore_header x-ms-exchange-crosstenant-rms-persistedconsumerorg bayes_ignore_header X-MS-Exchange-CrossTenant-userprincipalname bayes_ignore_header x-ms-exchange-slblob-mailprops bayes_ignore_header x-ms-office365-filtering-correlation-id bayes_ignore_header X-MSFBL bayes_ignore_header X-Provags-ID bayes_ignore_header X-SG-EID bayes_ignore_header X-SG-ID bayes_ignore_header X-UI-Out-Filterresults bayes_ignore_header X-ClientProxiedBy bayes_ignore_header X-MS-Exchange-CrossTenant-FromEntityHeader bayes_ignore_header X-OriginatorOrg bayes_ignore_header X-MS-Exchange-CrossTenant-OriginalArrivalTime bayes_ignore_header X-MS-TrafficTypeDiagnostic bayes_ignore_header X-MS-Exchange-CrossTenant-AuthAs bayes_ignore_header X-MS-Exchange-Transport-CrossTenantHeadersStamped bayes_ignore_header X-MS-Exchange-CrossTenant-AuthSource
Strange findings debugging bayes results
I was investigating a bunch of bitcoin spam: different titles, different senders (all from gmail), different text, different pdf attachment. Unfortunately in those days my bayes db was polluted and they all got a BAYES_50, 0.8. I tested the messages now with a recreated bayes db and got some BAYES_999. So I dug to understand if I already saw the spam... But the debug result was unpleasant: dbg: bayes: tokenized header: 92 tokens dbg: bayes: token 'HX-Received:Jan' => 0.998028449502134 dbg: bayes: token 'HX-Google-DKIM-Signature:20210112' => 0.997244532803181 dbg: bayes: token 'H*r:sk:' => 0.997244532803181 dbg: bayes: token 'H*r:a05' => 0.995425742574258 dbg: bayes: token 'HAuthentication-Results:sk:.' => 0.986543689320388 dbg: bayes: token 'HX-Google-DKIM-Signature:reply-to' => 0.916110175863517 dbg: bayes: token 'H*r:2002' => 0.877842810325844 dbg: bayes: token 'HAuthentication-Results:2048-bit' => 0.858520043212023 dbg: bayes: token 'HAuthentication-Results:pass' => 0.855193895034317 dbg: bayes: score = 0.97915091326 Every score is based on headers, very generic headers. and some related to my setup. Not a single token from the message body
Re: bayes in sqlite db
Heh, I know this thread is so old it might as well be dead, but this does work. Note that you may need to apply the patch from Bug 7932 until the next release. bayes_store_module Mail::SpamAssassin::BayesStore::SQL bayes_sql_dsn DBI:SQLite:/path/to/bayes.sqlite On 5/26/22 9:25 AM, Michael Grant wrote: Does anyone have a working example of storing Bayes and user prefs in SQLite? I only see mysql and postgres schemas in /usr/share/doc/spamassassin/sql/ Michael Grant
bayes in sqlite db
Does anyone have a working example of storing Bayes and user prefs in SQLite? I only see mysql and postgres schemas in /usr/share/doc/spamassassin/sql/ Michael Grant signature.asc Description: PGP signature
Re: rules for a sneaky SPEAR-VIRUS spam that gets past bayes
Just off the top of my head: rawbodyONEDRIVE_DOWNLOADm'https://onedrive\.live\.com/download[?]cid=' score ONEDRIVE_DOWNLOAD0.5 describeONEDRIVE_DOWNLOADDownload link to a file on Onedrive Personally I'd be inclined to put an i on the end of that. body FILE_PWD_INFO/\b(?:Fil lösenord|File password):\s[A-Z]{2}\d{4}\b/ scoreFILE_PWD_INFO3 describe FILE_PWD_INFOEmail has a password to an archive file meta PWD_ONEDRIVE_DLOADONEDRIVE_DOWNLOAD && FILE_PWD_INFO scorePWD_ONEDRIVE_DLOAD4 describe PWD_ONEDRIVE_DLOADEmail contains download for passworded Onedrive file Loren
rules for a sneaky SPEAR-VIRUS spam that gets past bayes
rules for a sneaky SPEAR-VIRUS spam that gets past bayes because legit content from hijacked emails are copied into the spam, making it look like a follow-up msg of an existing legit conversation. Catch using these rules below. (Perhaps also add more to this to prevent rare FPs? But this is a good start!) FILE SIZE < 50kb then, on decoded/demime'd msg: exact match on: *https://onedrive.live.com/download?cid=** * Then a hit on THIS RegEx: *\b(Fil lösenord|File password): [A-Z]{2}\d{4}\b** * (I'll let someone else jump in here and create and share the actual SA implementation of this, if desired - along with any suggested improvements) -- Rob McEwen, invaluement
Re: Question about user specific bayes
On 2022-01-18 22:34, Bill Cole wrote: Well, maybe? I don't currently have a system using per-user Bayes and it's been a bit since I set one up so hopefully someone who has a working rig will speak up... fuglu have pr user bayes pr default, and it recently fixed that local part before could be mixed case so sender could create another bayes user, ups, i had hoped on that this was solved in spamassassin core, but maybe in sa 4.0.0 Note that SA will try to create an empty DB if none exists. and if spamd / spamc uses virtual sql users, or have static db files for all users with read/write permissions, ideal if sqlite3 user prefs is configured it could be very simple I'm not sure that I can think up a circumstance (other than a disappearing user) where fallback to global Bayes would happen. is this even supported ? SA will not fall back to a global Bayes DB just because an otherwise perfectly good per-user DB isn't properly seeded. good
RE: Question about user specific bayes
> Note that SA will try to create an empty DB if none exists. I'm not sure that > I can think up a circumstance (other than a disappearing user) where fallback > > to global Bayes would happen. SA will not fall back to a global Bayes DB > just because an otherwise perfectly good per-user DB isn't properly seeded. It doesn't seem to be creating an empty database at all. Not sure why > -Original Message- > From: Bill Cole > Sent: Tuesday, January 18, 2022 12:23 PM > To: users@spamassassin.apache.org > Subject: Re: Question about user specific bayes > > On 2022-01-18 at 11:12:01 UTC-0500 (Tue, 18 Jan 2022 16:12:01 +) > Dino Edwards is rumored to have said: > >> Hi, >> >> Trying to implement user specific bayes. My current setup is setup as >> follows in regards to global bayes. I'm also using amavis: >> >> bayes_path /opt/sa-bayes/bayes >> bayes_file_mode 0777 > > Don't do that anywhere. It's not safe. > >> use_bayes 1 >> use_bayes_rules 1 >> bayes_auto_learn 0 >> bayes_auto_learn_threshold_spam 15 >> bayes_auto_learn_threshold_nonspam -5 > [...] >> >> and it did seem to create bayes_toks and bayes_seen files under the >> /opt/sa-bayes-users/b...@domain.tld<mailto:/opt/sa-bayes-users/bob@dom >> a >> in.tld> >> directory as expected. > > So, it is working. > >> Is this all that's required to get this working? > > Yes > >> What happens to the global bayes file in local.cf? Is that no longer >> used? > > I believe that it would be used if for some reason SA couldn't figure > out which user to pick for a scan at runtime. Maybe if spamd was > launched as a user that was later deleted? > > But generally, working per-user Bayes setup makes the global file > pointless and unused. > >> >> How do the following settings from the local.cf figure in the user >> specific bayes files? >> >> use_bayes 1 >> use_bayes_rules 1 >> bayes_auto_learn 0 >> bayes_auto_learn_threshold_spam 15 >> bayes_auto_learn_threshold_nonspam -5 > > The local.cf file is loaded before user_prefs, which is the last > config file loaded, so anything that can be changed in user_prefs > (i.e. all of those, I believe) which is set in user_prefs will 'stick' > > Note that in this case you're choosing to disable auto-learn, so the > threshold values are never used. > >> Do the user specific bayes have the same requirements to train them >> with at least 200 messages? > > Yes. Each Bayes DB must be seeded before it can be used. You should > also plan a way to regularly feed known spam and ham to those > databases, since you aren't auto-learning. > >> before they start working? > > Before SA will determine a Bayes score on incoming messages, yes. > > > > > -- > Bill Cole > b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many > *@billmail.scconsult.com addresses) Not Currently Available For Hire -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
Re: Question about user specific bayes
On 2022-01-18 at 13:40:29 UTC-0500 (Tue, 18 Jan 2022 18:40:29 +) Dino Edwards is rumored to have said: Hi, thanks for the quick reply. So when amavis calls on SA for an incoming message, it will pass the recipient (e-mail address) in the %u variable and then SA will take that variable and look in the /opt/sa-bayes-users/%u directory for the existence of bayes database and if it finds one, it will use it provided it's properly seeded. If not, it will fall back to the global bayes. Is that correct? Well, maybe? I don't currently have a system using per-user Bayes and it's been a bit since I set one up so hopefully someone who has a working rig will speak up... Note that SA will try to create an empty DB if none exists. I'm not sure that I can think up a circumstance (other than a disappearing user) where fallback to global Bayes would happen. SA will not fall back to a global Bayes DB just because an otherwise perfectly good per-user DB isn't properly seeded. -Original Message- From: Bill Cole Sent: Tuesday, January 18, 2022 12:23 PM To: users@spamassassin.apache.org Subject: Re: Question about user specific bayes On 2022-01-18 at 11:12:01 UTC-0500 (Tue, 18 Jan 2022 16:12:01 +) Dino Edwards is rumored to have said: Hi, Trying to implement user specific bayes. My current setup is setup as follows in regards to global bayes. I'm also using amavis: bayes_path /opt/sa-bayes/bayes bayes_file_mode 0777 Don't do that anywhere. It's not safe. use_bayes 1 use_bayes_rules 1 bayes_auto_learn 0 bayes_auto_learn_threshold_spam 15 bayes_auto_learn_threshold_nonspam -5 [...] and it did seem to create bayes_toks and bayes_seen files under the /opt/sa-bayes-users/b...@domain.tld<mailto:/opt/sa-bayes-users/bob@doma in.tld> directory as expected. So, it is working. Is this all that's required to get this working? Yes What happens to the global bayes file in local.cf? Is that no longer used? I believe that it would be used if for some reason SA couldn't figure out which user to pick for a scan at runtime. Maybe if spamd was launched as a user that was later deleted? But generally, working per-user Bayes setup makes the global file pointless and unused. How do the following settings from the local.cf figure in the user specific bayes files? use_bayes 1 use_bayes_rules 1 bayes_auto_learn 0 bayes_auto_learn_threshold_spam 15 bayes_auto_learn_threshold_nonspam -5 The local.cf file is loaded before user_prefs, which is the last config file loaded, so anything that can be changed in user_prefs (i.e. all of those, I believe) which is set in user_prefs will 'stick' Note that in this case you're choosing to disable auto-learn, so the threshold values are never used. Do the user specific bayes have the same requirements to train them with at least 200 messages? Yes. Each Bayes DB must be seeded before it can be used. You should also plan a way to regularly feed known spam and ham to those databases, since you aren't auto-learning. before they start working? Before SA will determine a Bayes score on incoming messages, yes. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
RE: Question about user specific bayes
Hi, thanks for the quick reply. So when amavis calls on SA for an incoming message, it will pass the recipient (e-mail address) in the %u variable and then SA will take that variable and look in the /opt/sa-bayes-users/%u directory for the existence of bayes database and if it finds one, it will use it provided it's properly seeded. If not, it will fall back to the global bayes. Is that correct? Thanks -Original Message- From: Bill Cole Sent: Tuesday, January 18, 2022 12:23 PM To: users@spamassassin.apache.org Subject: Re: Question about user specific bayes On 2022-01-18 at 11:12:01 UTC-0500 (Tue, 18 Jan 2022 16:12:01 +) Dino Edwards is rumored to have said: > Hi, > > Trying to implement user specific bayes. My current setup is setup as > follows in regards to global bayes. I'm also using amavis: > > bayes_path /opt/sa-bayes/bayes > bayes_file_mode 0777 Don't do that anywhere. It's not safe. > use_bayes 1 > use_bayes_rules 1 > bayes_auto_learn 0 > bayes_auto_learn_threshold_spam 15 > bayes_auto_learn_threshold_nonspam -5 [...] > > and it did seem to create bayes_toks and bayes_seen files under the > /opt/sa-bayes-users/b...@domain.tld<mailto:/opt/sa-bayes-users/bob@doma > in.tld> > directory as expected. So, it is working. > Is this all that's required to get this working? Yes > What happens to the global bayes file in local.cf? Is that no longer > used? I believe that it would be used if for some reason SA couldn't figure out which user to pick for a scan at runtime. Maybe if spamd was launched as a user that was later deleted? But generally, working per-user Bayes setup makes the global file pointless and unused. > > How do the following settings from the local.cf figure in the user > specific bayes files? > > use_bayes 1 > use_bayes_rules 1 > bayes_auto_learn 0 > bayes_auto_learn_threshold_spam 15 > bayes_auto_learn_threshold_nonspam -5 The local.cf file is loaded before user_prefs, which is the last config file loaded, so anything that can be changed in user_prefs (i.e. all of those, I believe) which is set in user_prefs will 'stick' Note that in this case you're choosing to disable auto-learn, so the threshold values are never used. > Do the user specific bayes have the same requirements to train them > with at least 200 messages? Yes. Each Bayes DB must be seeded before it can be used. You should also plan a way to regularly feed known spam and ham to those databases, since you aren't auto-learning. > before they start working? Before SA will determine a Bayes score on incoming messages, yes. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
Re: Question about user specific bayes
On 2022-01-18 at 11:12:01 UTC-0500 (Tue, 18 Jan 2022 16:12:01 +) Dino Edwards is rumored to have said: Hi, Trying to implement user specific bayes. My current setup is setup as follows in regards to global bayes. I'm also using amavis: bayes_path /opt/sa-bayes/bayes bayes_file_mode 0777 Don't do that anywhere. It's not safe. use_bayes 1 use_bayes_rules 1 bayes_auto_learn 0 bayes_auto_learn_threshold_spam 15 bayes_auto_learn_threshold_nonspam -5 [...] and it did seem to create bayes_toks and bayes_seen files under the /opt/sa-bayes-users/b...@domain.tld<mailto:/opt/sa-bayes-users/b...@domain.tld> directory as expected. So, it is working. Is this all that's required to get this working? Yes What happens to the global bayes file in local.cf? Is that no longer used? I believe that it would be used if for some reason SA couldn't figure out which user to pick for a scan at runtime. Maybe if spamd was launched as a user that was later deleted? But generally, working per-user Bayes setup makes the global file pointless and unused. How do the following settings from the local.cf figure in the user specific bayes files? use_bayes 1 use_bayes_rules 1 bayes_auto_learn 0 bayes_auto_learn_threshold_spam 15 bayes_auto_learn_threshold_nonspam -5 The local.cf file is loaded before user_prefs, which is the last config file loaded, so anything that can be changed in user_prefs (i.e. all of those, I believe) which is set in user_prefs will 'stick' Note that in this case you're choosing to disable auto-learn, so the threshold values are never used. Do the user specific bayes have the same requirements to train them with at least 200 messages? Yes. Each Bayes DB must be seeded before it can be used. You should also plan a way to regularly feed known spam and ham to those databases, since you aren't auto-learning. before they start working? Before SA will determine a Bayes score on incoming messages, yes. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
Question about user specific bayes
Hi, Trying to implement user specific bayes. My current setup is setup as follows in regards to global bayes. I'm also using amavis: bayes_path /opt/sa-bayes/bayes bayes_file_mode 0777 use_bayes 1 use_bayes_rules 1 bayes_auto_learn 0 bayes_auto_learn_threshold_spam 15 bayes_auto_learn_threshold_nonspam -5 According to various things I've read online, I've setup the following in /etc/default/spamassassin in an attempt to setup user specific bayes: OPTIONS="--create-prefs --max-children 5 --helper-home-dir=/opt/sa-bayes-users/%u -x -u amavis" I've also created a bunch of subdirectories with usernames under /opt/sa-bayes-users. Example: /opt/sa-bayes-users/b...@domain.tld<mailto:/opt/sa-bayes-users/b...@domain.tld> /opt/sa-bayes-users/la...@domain.tld<mailto:/opt/sa-bayes-users/la...@domain.tld> Etc... I've setup the owner in /opt/sa-bayes-users/ to amavis and I've also setup the permissions to 700. I've run a test sa-learn as follows where /mnt/data/amavis/clean/n/nTutbwTMVWzK is the actual e-mail file I use to train SA: sa-learn --spam --dbpath /opt/sa-bayes-users/b...@domain.tld /mnt/data/amavis/clean/n/nTutbwTMVWzK and it did seem to create bayes_toks and bayes_seen files under the /opt/sa-bayes-users/b...@domain.tld<mailto:/opt/sa-bayes-users/b...@domain.tld> directory as expected. Is this all that's required to get this working? What happens to the global bayes file in local.cf? Is that no longer used? How do the following settings from the local.cf figure in the user specific bayes files? use_bayes 1 use_bayes_rules 1 bayes_auto_learn 0 bayes_auto_learn_threshold_spam 15 bayes_auto_learn_threshold_nonspam -5 Do the user specific bayes have the same requirements to train them with at least 200 messages? before they start working? Thanks in advance
Re: Starting Clean with Bayes
On Sat, 23 Oct 2021, Benny Pedersen wrote: On 2021-10-20 16:58, John Hardin wrote: On Wed, 20 Oct 2021, Axb wrote: On 10/19/21 8:06 PM, Jerry Malcolm wrote: Where do I find a starter toks file? You don't need a "starter" file. Your Bayes starter is your training corpora, which you should retain in case you ever need to start over from scratch as you're doing now. no one asked how to make a backup/restore, with imho would have answered all this just like one would just use corpus retraining data A backup is fine for migration. A backup of a database that has gone off the rails is useless. It fairly accepted that there's no such thing as a "generic starter Bayes database" due to the variability of peoples' ham. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.org pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Are you a mildly tech-literate politico horrified by the level of ignorance demonstrated by lawmakers gearing up to regulate online technology they don't even begin to grasp? Cool. Now you have a tiny glimpse into a day in the life of a gun owner. -- Sean Davis --- 511 days since the first private commercial manned orbital mission (SpaceX)
Re: Starting Clean with Bayes
On 2021-10-20 16:58, John Hardin wrote: On Wed, 20 Oct 2021, Axb wrote: On 10/19/21 8:06 PM, Jerry Malcolm wrote: Where do I find a starter toks file? You don't need a "starter" file. Your Bayes starter is your training corpora, which you should retain in case you ever need to start over from scratch as you're doing now. no one asked how to make a backup/restore, with imho would have answered all this just like one would just use corpus retraining data hmm :) i just wish that its not only bayes that can be backup/restored but also TxRep and awl data this will make it possible to change from postgresql to redis if needed, who will use mysql or berkdb ?
Starting Clean With Bayes
I am starting over with a clean install of SA on an AWS Linux2 EC2. I'm am struggling with getting Bayes set up correctly. I have a very old bayes_toks file from a Jam Windows install from about 4 years ago. I created a userId for spamd, and I put the bayes_toks file in /home/spamd/bayes. I set the bayes_path in local.cf to /home/spamd/bayes/bayes. I changed the file owner to spamd:spamd. I get the error message: cannot open bayes databases /home/spamd/bayes/bayes_* R/O: tie failed I tried running the Spamassassin from the command line as sudo, and get the same error. So I don't think it's a permissions issue. So I moved the file out of the folder and now get: no dbs present, cannot tie DB R/O: /home/spamd/bayes/bayes_toks So in the first case it finds the file but can't open it. I found some posts on forums that suggested there's a possibility the file is so old the format is obsolete. Fine with me. At this point, I just want to start clean. But I can't find a way to start using bayes from scratch with no toks file starting off. I even did another clean install on a separate ec2 to see if SA would create an initial toks file. But I couldn't find one. My old toks file is probably of marginal value now anyway. I just need to know where to find a brand new toks file to put into my bayes_path folder so it can start building up the ham/spam file and start contributing to my SA scores. Where do I find a starter toks file? Thx
Re: Starting Clean with Bayes
On Wed, 20 Oct 2021, Axb wrote: On 10/19/21 8:06 PM, Jerry Malcolm wrote: Where do I find a starter toks file? You don't need a "starter" file. Your Bayes starter is your training corpora, which you should retain in case you ever need to start over from scratch as you're doing now. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.org pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- At what point then is the approach of danger to be expected? I answer, if it ever reach us, it must spring up amongst us. It cannot come from abroad. If destruction be our lot, we must ourselves be its author and finisher. As a nation of freemen, we must live through all time, or die by suicide. -- Abraham Lincoln ...popularly summarized as: "America will never be destroyed from the outside. If we falter and lose our freedoms, it will be because we destroyed ourselves." --- 508 days since the first private commercial manned orbital mission (SpaceX)
Re: Starting Clean with Bayes
On 10/19/21 8:06 PM, Jerry Malcolm wrote: Where do I find a starter toks file? You don't need a "starter" file. As soon as it needs them, SA automagically creates the necessary files if it can write into the defined path. Just feed it some spams and hams as per docs and you'll see the files.
Starting Clean with Bayes
I am starting over with a clean install of SA on an AWS Linux2 EC2. I'm am struggling with getting Bayes set up correctly. I have a very old bayes_toks file from a Jam Windows install from about 4 years ago. I created a userId for spamd, and I put the bayes_toks file in /home/spamd/bayes. I set the bayes_path in local.cf to /home/spamd/bayes/bayes. I changed the file owner to spamd:spamd. I get the error message: cannot open bayes databases /home/spamd/bayes/bayes_* R/O: tie failed I tried running the Spamassassin from the command line as sudo, and get the same error. So I don't think it's a permissions issue. So I moved the file out of the folder and now get: no dbs present, cannot tie DB R/O: /home/spamd/bayes/bayes_toks So in the first case it finds the file but can't open it. I found some posts on forums that suggested there's a possibility the file is so old the format is obsolete. Fine with me. At this point, I just want to start clean. But I can't find a way to start using bayes from scratch with no toks file starting off. I even did another clean install on a separate ec2 to see if SA would create an initial toks file. But I couldn't find one. My old toks file is probably of marginal value now anyway. I just need to know where to find a brand new toks file to put into my bayes_path folder so it can start building up the ham/spam file and start contributing to my SA scores. Where do I find a starter toks file? Thx
Re: Bayes autolearn: how does it resolve whether rules are body or header related?
On Mon, 10 May 2021 20:39:31 +0200 Bert Van de Poel wrote: > Based on what I've read, I agree that this is indeed a bug (or > actually several). I've filed the following bug reports: > https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7904 (missing body > types, as mentioned by RW) > https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7905 (meta > tflags=net tests are ignored) > https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7906 (meta > tflags!=net tests are always header tests) > https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7907 (better > support for meta tests in autolearning in general, with 2 possible > solutions) > > Thank you very much to RW and Matus Uhlar for helping me figure out > what code to look at and for al three of you to confirm that this is > clearly a set of bugs. I don't agree that they are bugs. I think it would be useful to add missing body types, but I don't think the rest is hugely wrong, and it's not sensible for anyone to spend a lot of time on it. Particularly when it so easy to to turn-off the 3+3 test selectively with autolearn_force. Net meta rules usually contain scored net eval rules so it's sensible to ignore them. Treating meta rules as header points seems to be erring on the right side. There's a case for ignoring metarules altogether Autolearning is something that's best avoided if at all possible. Erring on on the side of avoiding mistraining is a good thing.
bayes stopwords.cf missing ifplugin
ups
Re: Bayes autolearn: how does it resolve whether rules are body or header related?
Dear Loren, Thank you very much for your email. Based on your message I could deduce there were earlier messages (which I then read through a web archive). For some unexplained reason I never received the previous 3 responses to my email. I hope the university network isn't randomly over-filtering spam again (we've had those kinds of problems for a while now, it's quite a problem, we are much more careful about how we mark spam). Based on what I've read, I agree that this is indeed a bug (or actually several). I've filed the following bug reports: https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7904 (missing body types, as mentioned by RW) https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7905 (meta tflags=net tests are ignored) https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7906 (meta tflags!=net tests are always header tests) https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7907 (better support for meta tests in autolearning in general, with 2 possible solutions) Thank you very much to RW and Matus Uhlar for helping me figure out what code to look at and for al three of you to confirm that this is clearly a set of bugs. Feel free to file more bugs if you consider there are more based on my issue, as well as to give support, write suggestions or submit patches on the bugs I have already filed. Kind regards, Bert Van de Poel On 10/05/2021 06:41, Loren Wilton wrote: so you don't have points from body rules. your mentioned URI_DEOBFU_INSTR is a meta rule: meta URI_DEOBFU_INSTR __URI_DEOBFU_INSTR && !__MSGID_OK_HOST so maybe it's not considered. They are treated as header, or ignored if marked as net. I think a bug report should be submitted for this. Either they should be treated split 50/50 as header and body score, or when the metas are built they shoudl have a "body rule" flag, and that used to determine where the score goes. I tried, but for some reason apache decided that I'm evil and blocked the submission attempt, so someone else can do it. Loren
Re: Bayes autolearn: how does it resolve whether rules are body or header related?
so you don't have points from body rules. your mentioned URI_DEOBFU_INSTR is a meta rule: meta URI_DEOBFU_INSTR __URI_DEOBFU_INSTR && !__MSGID_OK_HOST so maybe it's not considered. They are treated as header, or ignored if marked as net. I think a bug report should be submitted for this. Either they should be treated split 50/50 as header and body score, or when the metas are built they shoudl have a "body rule" flag, and that used to determine where the score goes. I tried, but for some reason apache decided that I'm evil and blocked the submission attempt, so someone else can do it. Loren
Re: Bayes autolearn: how does it resolve whether rules are body or header related?
On Sun, 9 May 2021 20:03:27 +0200 Matus UHLAR - fantomas wrote: > so you don't have points from body rules. > > your mentioned URI_DEOBFU_INSTR is a meta rule: > > meta URI_DEOBFU_INSTR __URI_DEOBFU_INSTR && !__MSGID_OK_HOST > > so maybe it's not considered. They are treated as header, or ignored if marked as net.
Re: Bayes autolearn: how does it resolve whether rules are body or header related?
On 09.05.21 04:17, Bert Van de Poel wrote: Dear fellow Spamassassin users, I recently noticed that quite a lot of spam emails with high scores weren't marked for Bayes autolearning. While some senders and receivers were a common match, explaining why autolearn was nog, there was no clear explanation for other cases. I therefore put Spamassassin in debug mode to check in more detail, and noticed that fairly often autolearn is not used because the minimum score for body tests isn't achieved. After looking at some specific cases, it seems however that several rules are either not considered when calculating the header rule score and body rule score for Bayes autolearning. I've always presumed these scores are calculated based on whether the underlying rule performs a regex on a header or on the body, but now I'm not so sure any more. I hope you can help clear up whether this is intended behaviour (and what that behaviour is) or whether I should report this as a bug. One example I noticed is URI_DEOBFU_INSTR=3.595. This is if I understand it correctly a URI test that's performed on the body. Should a test like this be counted towards the body score count? Then there's the question of meta rules such as MONEY_NOHTML. If you resolve the different meta levels within this rule, it's a combination of header and body, however it's only counted towards the header score. Finally, it seems as if custom rules I've added within local.cf aren't considered. Is that indeed the case (and if so, is that by design)? I'm also not completely sure if UNWANTED_BODY_LANGUAGE and tests like razor, pyzor and DCC are considered for body scores. Within the same realm, I'm also wondering whether these expected numbers for body and header can be tweaked and if so, how. For example the case below isn't autolearned even though it has a huge score and a vast amount of tests going off, but seemingly not enough body-related scores. Is that really the intended behaviour? May 8 10:40:32 mail amavis[4076058]: (4076058-16) header_edits_for_quar: -> , Yes, score=24.619 tag=- tag2=5 kill=7.5 tests=[ADVANCE_FEE_3_NEW_MONEY=0.001, AXB_XMAILER_MIMEOLE_OL_024C2=0.001, BAYES_50=0.8, BERT_KULSPAM=1, FORGED_MUA_OUTLOOK=1.927, FREEMAIL_FORGED_REPLYTO=2.095, FREEMAIL_REPLYTO=1, FREEMAIL_REPLYTO_END_DIGIT=0.25, FROM_MISSPACED=0.001, FROM_MISSP_EH_MATCH=0.001, FROM_MISSP_FREEMAIL=0.001, FROM_MISSP_MSFT=0.001, FROM_MISSP_REPLYTO=2.497, FSL_BULK_SIG=0.001, FSL_CTYPE_WIN1251=0.001, FSL_NEW_HELO_USER=0.001, KHOP_HELO_FCRDNS=0.398, LOTS_OF_MONEY=0.001, MISSING_HEADERS=1.021, MISSING_MID=0.497, MONEY_FREEMAIL_REPTO=1.202, MONEY_FROM_MISSP=0.001, MONEY_NOHTML=2.497, NSL_RCVD_HELO_USER=0.001, PYZOR_CHECK=1.392, REPLYTO_WITHOUT_TO_CC=1.552, REPTO_419_FRAUD=2.996, SPF_HELO_NONE=0.001, TO_NO_BRKTS_FROM_MSSP=1.593, TO_NO_BRKTS_MSFT=1.888, XFER_LOTSA_MONEY=0.001] autolearn=no autolearn_force=no Thank you in advance for your help. If you need any more examples or would us to run some tests, then feel free to let me know. looks like most of those are meta rules: header FREEMAIL_REPLYTO_END_DIGIT header MISSING_HEADERS body BAYES_50 header SPF_HELO_NONE header FSL_CTYPE_WIN1251 header NSL_RCVD_HELO_USER header REPTO_419_FRAUD score FREEMAIL_REPLYTO_END_DIGIT 0.25 score MISSING_HEADERS 0.915 1.207 1.204 1.021 score SPF_HELO_NONE 0.001 so you don't have points from body rules. your mentioned URI_DEOBFU_INSTR is a meta rule: meta URI_DEOBFU_INSTR __URI_DEOBFU_INSTR && !__MSGID_OK_HOST so maybe it's not considered. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Linux IS user friendly, it's just selective who its friends are...
Re: Bayes autolearn: how does it resolve whether rules are body or header related?
On Sun, 9 May 2021 04:17:26 +0200 Bert Van de Poel wrote: > Within the same realm, I'm also wondering whether these expected > numbers for body and header can be tweaked and if so, how. You can create a meta-rule for definite spam and set: tflags autolearn_force a hit on any rule with this flag set causes the 3+3 check to be ignored. It does nothing else. One thing that does look wrong is that maybe_body_only() looks for: (($type == $TYPE_BODY_TESTS) || ($type == $TYPE_BODY_EVALS) || ($type == $TYPE_URI_TESTS) || ($type == $TYPE_URI_EVALS)) so it's missing any rawbody and full rules. Specifically Pyzor, Razor2 and DCC are full eval rules.
Bayes autolearn: how does it resolve whether rules are body or header related?
Dear fellow Spamassassin users, I recently noticed that quite a lot of spam emails with high scores weren't marked for Bayes autolearning. While some senders and receivers were a common match, explaining why autolearn was nog, there was no clear explanation for other cases. I therefore put Spamassassin in debug mode to check in more detail, and noticed that fairly often autolearn is not used because the minimum score for body tests isn't achieved. After looking at some specific cases, it seems however that several rules are either not considered when calculating the header rule score and body rule score for Bayes autolearning. I've always presumed these scores are calculated based on whether the underlying rule performs a regex on a header or on the body, but now I'm not so sure any more. I hope you can help clear up whether this is intended behaviour (and what that behaviour is) or whether I should report this as a bug. One example I noticed is URI_DEOBFU_INSTR=3.595. This is if I understand it correctly a URI test that's performed on the body. Should a test like this be counted towards the body score count? Then there's the question of meta rules such as MONEY_NOHTML. If you resolve the different meta levels within this rule, it's a combination of header and body, however it's only counted towards the header score. Finally, it seems as if custom rules I've added within local.cf aren't considered. Is that indeed the case (and if so, is that by design)? I'm also not completely sure if UNWANTED_BODY_LANGUAGE and tests like razor, pyzor and DCC are considered for body scores. Within the same realm, I'm also wondering whether these expected numbers for body and header can be tweaked and if so, how. For example the case below isn't autolearned even though it has a huge score and a vast amount of tests going off, but seemingly not enough body-related scores. Is that really the intended behaviour? May 8 10:40:32 mail amavis[4076058]: (4076058-16) header_edits_for_quar: -> , Yes, score=24.619 tag=- tag2=5 kill=7.5 tests=[ADVANCE_FEE_3_NEW_MONEY=0.001, AXB_XMAILER_MIMEOLE_OL_024C2=0.001, BAYES_50=0.8, BERT_KULSPAM=1, FORGED_MUA_OUTLOOK=1.927, FREEMAIL_FORGED_REPLYTO=2.095, FREEMAIL_REPLYTO=1, FREEMAIL_REPLYTO_END_DIGIT=0.25, FROM_MISSPACED=0.001, FROM_MISSP_EH_MATCH=0.001, FROM_MISSP_FREEMAIL=0.001, FROM_MISSP_MSFT=0.001, FROM_MISSP_REPLYTO=2.497, FSL_BULK_SIG=0.001, FSL_CTYPE_WIN1251=0.001, FSL_NEW_HELO_USER=0.001, KHOP_HELO_FCRDNS=0.398, LOTS_OF_MONEY=0.001, MISSING_HEADERS=1.021, MISSING_MID=0.497, MONEY_FREEMAIL_REPTO=1.202, MONEY_FROM_MISSP=0.001, MONEY_NOHTML=2.497, NSL_RCVD_HELO_USER=0.001, PYZOR_CHECK=1.392, REPLYTO_WITHOUT_TO_CC=1.552, REPTO_419_FRAUD=2.996, SPF_HELO_NONE=0.001, TO_NO_BRKTS_FROM_MSSP=1.593, TO_NO_BRKTS_MSFT=1.888, XFER_LOTSA_MONEY=0.001] autolearn=no autolearn_force=no Thank you in advance for your help. If you need any more examples or would us to run some tests, then feel free to let me know. Kind regards, Bert Van de Poel ULYSSIS
Re: SA's bayes with the Redis backend?
On 2021-02-11 12:58 pm, Alex wrote: > Hi, > >> I've had good luck with using mariadb and galera to share the spamassassin >> database across systems. I run a small 3-node setup for email, 2x servers >> running dovecot replicating to each other, and a 3rd galera quorum server. >> Mariadb is master-master across all 3 nodes, so changes on any one are >> replicated to all the others via vpn. Works well, and for the amount of data >> in the spamassassin database, it replicates very quickly. > > This sounds very interesting to me. Can you share more details about > your configuration? I haven't worked with galera before, but have some > experience with mariadb - it's currently set up as a single master > with the actual mail relays being set up as slaves. I'd imagine the > first thing is to convert them all to masters... > > Any help would be greatly appreciated. > Thanks, > Alex Sure. I have a (fairly complex) ansible playbook that sets up the whole 3-node cluster, but here are the relevant details. This is the galera portion of _/etc/mysql/my.cnf_ > # > # * Galera-related settings > # > [galera] > bind-address = 0.0.0.0 > binlog_format = row > default_storage_engine = InnoDB > innodb_autoinc_lock_mode = 2 > innodb_flush_log_at_trx_commit = 2 > wsrep_cluster_address = gcomm://master.vpn,dove1.vpn,dove2.vpn > wsrep_node_name = master.vpn > > # Need to specify vpn address here, not public address > wsrep_node_address = 192.168.100.50 > > wsrep_cluster_name = my_cluster > wsrep_on = 1 > wsrep_provider = /usr/lib/galera/libgalera_smm.so > wsrep_sst_auth = "root:my_sekrit_password" And this is the ansible role mariadb_restart/tasks/main.yml. This gets called whenever another mariadb-affecting task sets the DO_RESTART variable. This will cleanly restart the whole mariadb galera cluster. > - become: yes > block: > > - name: Check status of mysqld > command: systemctl status mysql > ignore_errors: yes > changed_when: false > register: mysql_status > > - name: Gracefully stop mysql on all nodes to start up cluster > service: > name: "mysql" > state: "stopped" > register: mysql_stopped > when: mysql_status is succeeded > > - name: Force kill mysqld if stuck in starting state > command: pkill -9 mysqld > ignore_errors: yes > changed_when: false > when: mysql_stopped is failed > > - name: Clear RAM caches to free up space > command: sysctl -w vm.drop_caches=3 > when: ansible_virtualization_type != "openvz" > changed_when: false > > - name: Check if grastate.dat file exists for bootstrapping node0 > stat: > path: /var/lib/mysql/grastate.dat > register: grastate_exists > when: inventory_hostname == play_hosts[0] > > - name: Force node0 to be a new bootstrap node > lineinfile: > dest: /var/lib/mysql/grastate.dat > regexp: 'safe_to_bootstrap: 0' > line: 'safe_to_bootstrap: 1' > when: > - inventory_hostname == play_hosts[0] > - grastate_exists.stat.exists > > - name: bootstrap a new cluster with galera_new_cluster > shell: /usr/bin/galera_new_cluster > when: inventory_hostname == play_hosts[0] > > - name: add slave nodes to the cluster > service: > name: "mysql" > state: "started" > when: inventory_hostname != play_hosts[0] > > - name: Stop mysql on node0 > service: > name: "mysql" > state: "stopped" > when: inventory_hostname == play_hosts[0] > > - name: re-add master node to the cluster > service: > name: "mysql" > state: "started" > when: inventory_hostname == play_hosts[0] > > # > # block > when: do_restart | bool Argh - formatting got messed up. But you get the idea. It can also be run via a small script that runs ansible-playbook like this > ansible-playbook -K -i hosts mariadb.yml --extra-vars "{do_restart: true}" In the list of hosts, node0 is the master or quorum node. The other two are the dovecot replication nodes (dovecot, exim4, roundcube, etc) I have a bash alias to check on cluster status ... > alias cluster='mysql -B -s -N -e "show status like "%wsrep_cluster%";"' > > dc@master:~$ cluster > wsrep_cluster_weight 3 > wsrep_cluster_capabilities > wsrep_cluster_conf_id 10 > wsrep_cluster_size 3 > wsrep_cluster_state_uuid 480b440d-6643-11eb-94bc-5b47cf0676a8 > wsrep_cluster_status Primary
Re: SA's bayes with the Redis backend?
On Thursday 11 February 2021 at 17:21:41, deano-spamassas...@areyes.com wrote: > Is there an easy/efficient way of converting an existing mariadb bayes > database to redis? > > Perhaps "sa-learn --backup", set up redis, then restore? https://www.mail-archive.com/users@spamassassin.apache.org/msg107512.html answers this for you, I think :) Antony. -- There are two possible outcomes: If the result confirms the hypothesis, then you've made a measurement. If the result is contrary to the hypothesis, then you've made a discovery. - Enrico Fermi Please reply to the list; please *don't* CC me.
Re: SA's bayes with the Redis backend?
On 2021-02-11 9:54 am, Alex wrote: > Hi, > There is no real question, but what I would like to find out is (and to ask), > does it scale and are any pitfalls? Naturally, we would look at doing HA, but > am asking for that any comment, any tip, any opinion on using redis for > bayes. Been using it from day one (I'm party to blame we have this) and it > scales VERY well. Bayes processing bottleneck has become a thing of the past. > Pifalls? none so far. I wouldn't go back anymore. Obviously, it's global > only, no per user. Is there an easy/efficient way of converting an existing mariadb bayes database to redis? Perhaps "sa-learn --backup", set up redis, then restore? I know I've been less than successful in the past when migrating from one version of mariadb to another, so just wondering how successful this approach would be. The problem I'm having with bayes in mariadb is being able to use a central database server for the database, while reading and updating it from remote systems. Will redis solve this problem? # sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 11083 0 non-token data: nspam 0.000 0 48363 0 non-token data: nham 0.000 0 3709015 0 non-token data: ntokens 0.000 0 1372117134 0 non-token data: oldest atime 0.000 0 1613055126 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 1606461007 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count I've had good luck with using mariadb and galera to share the spamassassin database across systems. I run a small 3-node setup for email, 2x servers running dovecot replicating to each other, and a 3rd galera quorum server. Mariadb is master-master across all 3 nodes, so changes on any one are replicated to all the others via vpn. Works well, and for the amount of data in the spamassassin database, it replicates very quickly.
Re: SA's bayes with the Redis backend?
Hi, > > There is no real question, but what I would like to find out is (and to > > ask), does it scale and are any pitfalls? > > Naturally, we would look at doing HA, but am asking for that any > > comment, any tip, any opinion on using redis for bayes. > > Been using it from day one (I'm party to blame we have this) and it > scales VERY well. Bayes processing bottleneck has become a thing of the > past. > Pifalls? none so far. > > I wouldn't go back anymore. > Obviously, it's global only, no per user. Is there an easy/efficient way of converting an existing mariadb bayes database to redis? Perhaps "sa-learn --backup", set up redis, then restore? I know I've been less than successful in the past when migrating from one version of mariadb to another, so just wondering how successful this approach would be. The problem I'm having with bayes in mariadb is being able to use a central database server for the database, while reading and updating it from remote systems. Will redis solve this problem? # sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 11083 0 non-token data: nspam 0.000 0 48363 0 non-token data: nham 0.000 03709015 0 non-token data: ntokens 0.000 0 1372117134 0 non-token data: oldest atime 0.000 0 1613055126 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 1606461007 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count
Re: SA's bayes with the Redis backend?
Hi Brent, On 2/10/21 12:21 PM, Brent Clark wrote: Good day Guys I just want to check with the community, is there anybody using SA's bayes with the Redis backend? I work at a largish ISP, so we talking lots of mail. There is no real question, but what I would like to find out is (and to ask), does it scale and are any pitfalls? Naturally, we would look at doing HA, but am asking for that any comment, any tip, any opinion on using redis for bayes. Been using it from day one (I'm party to blame we have this) and it scales VERY well. Bayes processing bottleneck has become a thing of the past. Pifalls? none so far. I wouldn't go back anymore. Obviously, it's global only, no per user. Axb
SA's bayes with the Redis backend?
Good day Guys I just want to check with the community, is there anybody using SA's bayes with the Redis backend? I work at a largish ISP, so we talking lots of mail. There is no real question, but what I would like to find out is (and to ask), does it scale and are any pitfalls? Naturally, we would look at doing HA, but am asking for that any comment, any tip, any opinion on using redis for bayes. Thanks in advance. Regards Brent
Re: Bayes converstion: SQL--> Redis?
On 2/4/2021 5:32 AM, Giovanni Bechis wrote: On 2/4/21 10:47 AM, Dan Mahoney (Gushi) wrote: Hey there all, In looking at my sql server, it looks like the on-disk size of my MySQL DB's is like 9G (because of InnoDB, it's hard to glean just from the filesystem what tables are which). Anyway, I'd like to move over to a global redis system, but I don't see an easy way to convert from bayes SQL to redis bayes. Is this somewhere and I can't find it? "sa-learn --backup" with old config and "sa-learn --restore" with new one should do what you need. Giovanni Hi Gushi, I also like to use innodb-file-per-table = 1 so I don't have one centralized innodb file.
Re: Bayes converstion: SQL--> Redis?
On 2/4/21 10:47 AM, Dan Mahoney (Gushi) wrote: > Hey there all, > > In looking at my sql server, it looks like the on-disk size of my MySQL DB's > is like 9G (because of InnoDB, it's hard to glean just from the filesystem > what tables are which). > > Anyway, I'd like to move over to a global redis system, but I don't see an > easy way to convert from bayes SQL to redis bayes. > > Is this somewhere and I can't find it? > "sa-learn --backup" with old config and "sa-learn --restore" with new one should do what you need. Giovanni
Bayes converstion: SQL--> Redis?
Hey there all, In looking at my sql server, it looks like the on-disk size of my MySQL DB's is like 9G (because of InnoDB, it's hard to glean just from the filesystem what tables are which). Anyway, I'd like to move over to a global redis system, but I don't see an easy way to convert from bayes SQL to redis bayes. Is this somewhere and I can't find it? -Dan -- Dan Mahoney Techie, Sysadmin, WebGeek Gushi on efnet/undernet IRC FB: fb.com/DanielMahoneyIV LI: linkedin.com/in/gushi Site: http://www.gushi.org ---
Re: Error "cannot open bayes databases" lock failed: File exists
On 21.01.21 13:41, Emanuel Gonzalez wrote: anyway, the error is still represented even with low configuration values. Jan 21 10:39:43 eternia6 spamd[28053]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Jan 21 10:39:43 eternia6 spamd[28299]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Jan 21 10:39:43 eternia6 spamd[28273]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Anyone know any way to fix it?? I have mentioned that before, citing from message you quoted: If you process too much mail, you could store bayes database in SQL or redis. However, first lower amount of processes. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Support bacteria - they're the only culture some people have.
Re: Error "cannot open bayes databases" lock failed: File exists
On Thu, 21 Jan 2021 14:08:59 +0100 Matus UHLAR - fantomas wrote: > journalling may help a bit, but it makes no sense to parse more mail > within one CPU at the same time. That's true provided that everything remains completely CPU limited. The problem is that if you run any network tests and something becomes slow or unreliable, child processes can spend most of their time blocked. If you have multiple processes per core, the throughput can be more reliable. I'd start with 5 processes per core and see how it goes. > >model name : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz > > 4 cores, 8 threads. provided you only have one CPU. > > I'd set max-children to 4 and not set min-children,min-spare and > max-spare at all. If you do that you implicitly set them to 2,1 and 2 respectively. If you want a fixed number you can set the min and max values equal.
RE: Error "cannot open bayes databases" lock failed: File exists
anyway, the error is still represented even with low configuration values. Jan 21 10:39:43 eternia6 spamd[28053]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Jan 21 10:39:43 eternia6 spamd[28299]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Jan 21 10:39:43 eternia6 spamd[28273]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Anyone know any way to fix it?? Regards Emanuel. De: Emanuel Gonzalez Enviado: jueves, 21 de enero de 2021 10:35 Para: Matus UHLAR - fantomas ; users@spamassassin.apache.org Asunto: RE: Error "cannot open bayes databases" lock failed: File exists I'm testing right now. I have lowered the parameters but in the logs I see an error or warning: prefork: adjust: 3 idle children more than 2 maximum idle children. Decreasing spamd children: 28057 killed. That message can cause slow analysis of emails? In my infrastructure I have about 10 physical servers with spamassassin, using the keepalived service the requests are balanced between them. Regards, Emanuel. De: Emanuel Gonzalez Enviado: miércoles, 20 de enero de 2021 15:31 Para: Matus UHLAR - fantomas ; users@spamassassin.apache.org Asunto: RE: Error "cannot open bayes databases" lock failed: File exists The problem can be generated by the number of processes? # Server CPU cpu family : 6 model : 60 model name : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz # SpamAssassin SPAMDOPTIONS="-u spamd --min-children=30 --max-children=80 --min-spare=25 --max-spare=80 --timeout-child=60 --max-conn-per-child=150 what change i need to apply? Regards, Emanuel. De: Matus UHLAR - fantomas Enviado: miércoles, 20 de enero de 2021 15:28 Para: users@spamassassin.apache.org Asunto: Re: Error "cannot open bayes databases" lock failed: File exists On 20.01.21 14:50, Emanuel Gonzalez wrote: >Hello Matus, thanks for your reply. > ># ls -la /var/spamassassin/bayesdb/bayes > >ls: no se puede acceder a /var/spamassassin/bayesdb/bayes: No existe el >fichero o el directorio >I see an error of inexistent file. sorry, that was supposed to be: ls -la /var/spamassassin/bayesdb/ so we can see hidden files too. /var/spamassassin/bayesdb/bayes* does NOT show hidden filesa. ...however you showed us many lock files, which should explain. ># lsof /var/spamassassin/bayesdb/bayes_journal >/var/spamassassin/bayesdb/bayes_seen /var/spamassassin/bayesdb/bayes_toks > >COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME >spamd 25467 spamd 12r REG8,1 5132288 402667308 >/var/spamassassin/bayesdb/bayes_toks >spamd 25470 spamd 15r REG8,1 5132288 402667308 >/var/spamassassin/bayesdb/bayes_toks >spamd 25491 spamd 36r REG8,1 5132288 402667308 >/var/spamassassin/bayesdb/bayes_toks >spamd 25494 spamd 39r REG8,1 5132288 402667308 >/var/spamassassin/bayesdb/bayes_toks >spamd 25502 spamd 47r REG8,1 5132288 402667308 >/var/spamassassin/bayesdb/bayes_toks [...] ohh! too many processes. I don't recommend more spamd processes than e.g. 2x number of CPUs. maybe even less. It does not make sense to run too many processes in parallel. If you process too much mail, you could store bayes database in SQL or redis. However, first lower amount of processes. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -- Benjamin Franklin, 1759
RE: Error "cannot open bayes databases" lock failed: File exists
I'm testing right now. I have lowered the parameters but in the logs I see an error or warning: prefork: adjust: 3 idle children more than 2 maximum idle children. Decreasing spamd children: 28057 killed. That message can cause slow analysis of emails? In my infrastructure I have about 10 physical servers with spamassassin, using the keepalived service the requests are balanced between them. Regards, Emanuel. De: Emanuel Gonzalez Enviado: miércoles, 20 de enero de 2021 15:31 Para: Matus UHLAR - fantomas ; users@spamassassin.apache.org Asunto: RE: Error "cannot open bayes databases" lock failed: File exists The problem can be generated by the number of processes? # Server CPU cpu family : 6 model : 60 model name : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz # SpamAssassin SPAMDOPTIONS="-u spamd --min-children=30 --max-children=80 --min-spare=25 --max-spare=80 --timeout-child=60 --max-conn-per-child=150 what change i need to apply? Regards, Emanuel. De: Matus UHLAR - fantomas Enviado: miércoles, 20 de enero de 2021 15:28 Para: users@spamassassin.apache.org Asunto: Re: Error "cannot open bayes databases" lock failed: File exists On 20.01.21 14:50, Emanuel Gonzalez wrote: >Hello Matus, thanks for your reply. > ># ls -la /var/spamassassin/bayesdb/bayes > >ls: no se puede acceder a /var/spamassassin/bayesdb/bayes: No existe el >fichero o el directorio >I see an error of inexistent file. sorry, that was supposed to be: ls -la /var/spamassassin/bayesdb/ so we can see hidden files too. /var/spamassassin/bayesdb/bayes* does NOT show hidden filesa. ...however you showed us many lock files, which should explain. ># lsof /var/spamassassin/bayesdb/bayes_journal >/var/spamassassin/bayesdb/bayes_seen /var/spamassassin/bayesdb/bayes_toks > >COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME >spamd 25467 spamd 12r REG8,1 5132288 402667308 >/var/spamassassin/bayesdb/bayes_toks >spamd 25470 spamd 15r REG8,1 5132288 402667308 >/var/spamassassin/bayesdb/bayes_toks >spamd 25491 spamd 36r REG8,1 5132288 402667308 >/var/spamassassin/bayesdb/bayes_toks >spamd 25494 spamd 39r REG8,1 5132288 402667308 >/var/spamassassin/bayesdb/bayes_toks >spamd 25502 spamd 47r REG8,1 5132288 402667308 >/var/spamassassin/bayesdb/bayes_toks [...] ohh! too many processes. I don't recommend more spamd processes than e.g. 2x number of CPUs. maybe even less. It does not make sense to run too many processes in parallel. If you process too much mail, you could store bayes database in SQL or redis. However, first lower amount of processes. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -- Benjamin Franklin, 1759
Re: Error "cannot open bayes databases" lock failed: File exists
On 20.01.21 18:31, Emanuel Gonzalez wrote: The problem can be generated by the number of processes? number of concurrent processes trying to write to the bayes DB at the same time. journalling may help a bit, but it makes no sense to parse more mail within one CPU at the same time. # Server CPU cpu family : 6 model : 60 model name : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz 4 cores, 8 threads. provided you only have one CPU. I'd set max-children to 4 and not set min-children,min-spare and max-spare at all. ... on some systems I disable HT CPUs by disabling in /etc/sysfs.conf: devices/system/cpu/cpu4/online = 0 devices/system/cpu/cpu5/online = 0 devices/system/cpu/cpu6/online = 0 devices/system/cpu/cpu7/online = 0 I think since spectre/meltdown it's a good idea, and some systems reported high dummy CPU usage when those were enabled. # SpamAssassin SPAMDOPTIONS="-u spamd --min-children=30 --max-children=80 --min-spare=25 --max-spare=80 --timeout-child=60 --max-conn-per-child=150 ohh! too many processes. I don't recommend more spamd processes than e.g. 2x number of CPUs. maybe even less. It does not make sense to run too many processes in parallel. If you process too much mail, you could store bayes database in SQL or redis. However, first lower amount of processes. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. 10 GOTO 10 : REM (C) Bill Gates 1998, All Rights Reserved!
RE: Error "cannot open bayes databases" lock failed: File exists
The problem can be generated by the number of processes? # Server CPU cpu family : 6 model : 60 model name : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz # SpamAssassin SPAMDOPTIONS="-u spamd --min-children=30 --max-children=80 --min-spare=25 --max-spare=80 --timeout-child=60 --max-conn-per-child=150 what change i need to apply? Regards, Emanuel. De: Matus UHLAR - fantomas Enviado: miércoles, 20 de enero de 2021 15:28 Para: users@spamassassin.apache.org Asunto: Re: Error "cannot open bayes databases" lock failed: File exists On 20.01.21 14:50, Emanuel Gonzalez wrote: >Hello Matus, thanks for your reply. > ># ls -la /var/spamassassin/bayesdb/bayes > >ls: no se puede acceder a /var/spamassassin/bayesdb/bayes: No existe el >fichero o el directorio >I see an error of inexistent file. sorry, that was supposed to be: ls -la /var/spamassassin/bayesdb/ so we can see hidden files too. /var/spamassassin/bayesdb/bayes* does NOT show hidden filesa. ...however you showed us many lock files, which should explain. ># lsof /var/spamassassin/bayesdb/bayes_journal >/var/spamassassin/bayesdb/bayes_seen /var/spamassassin/bayesdb/bayes_toks > >COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME >spamd 25467 spamd 12r REG8,1 5132288 402667308 >/var/spamassassin/bayesdb/bayes_toks >spamd 25470 spamd 15r REG8,1 5132288 402667308 >/var/spamassassin/bayesdb/bayes_toks >spamd 25491 spamd 36r REG8,1 5132288 402667308 >/var/spamassassin/bayesdb/bayes_toks >spamd 25494 spamd 39r REG8,1 5132288 402667308 >/var/spamassassin/bayesdb/bayes_toks >spamd 25502 spamd 47r REG8,1 5132288 402667308 >/var/spamassassin/bayesdb/bayes_toks [...] ohh! too many processes. I don't recommend more spamd processes than e.g. 2x number of CPUs. maybe even less. It does not make sense to run too many processes in parallel. If you process too much mail, you could store bayes database in SQL or redis. However, first lower amount of processes. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -- Benjamin Franklin, 1759
Re: Error "cannot open bayes databases" lock failed: File exists
On 20.01.21 14:50, Emanuel Gonzalez wrote: Hello Matus, thanks for your reply. # ls -la /var/spamassassin/bayesdb/bayes ls: no se puede acceder a /var/spamassassin/bayesdb/bayes: No existe el fichero o el directorio I see an error of inexistent file. sorry, that was supposed to be: ls -la /var/spamassassin/bayesdb/ so we can see hidden files too. /var/spamassassin/bayesdb/bayes* does NOT show hidden filesa. ...however you showed us many lock files, which should explain. # lsof /var/spamassassin/bayesdb/bayes_journal /var/spamassassin/bayesdb/bayes_seen /var/spamassassin/bayesdb/bayes_toks COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME spamd 25467 spamd 12r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25470 spamd 15r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25491 spamd 36r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25494 spamd 39r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25502 spamd 47r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks [...] ohh! too many processes. I don't recommend more spamd processes than e.g. 2x number of CPUs. maybe even less. It does not make sense to run too many processes in parallel. If you process too much mail, you could store bayes database in SQL or redis. However, first lower amount of processes. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -- Benjamin Franklin, 1759
Re: Error "cannot open bayes databases" lock failed: File exists
On Wed, 20 Jan 2021 14:50:53 + Emanuel Gonzalez wrote: > # lsof /var/spamassassin/bayesdb/bayes_journal > /var/spamassassin/bayesdb/bayes_seen > /var/spamassassin/bayesdb/bayes_toks > > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > spamd 25467 spamd 12r REG8,1 5132288 402667308 > /var/spamassassin/bayesdb/bayes_toks spamd 25467 spamd 13r REG > 8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd > 25470 spamd 15r REG8,1 5132288 402667308 > /var/spamassassin/bayesdb/bayes_toks spamd 25470 spamd 16r REG > 8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd ... > 29921 spamd 192r REG8,1 5132288 402667308 > /var/spamassassin/bayesdb/bayes_toks spamd 29921 spamd 193r REG > 8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen Do you actually need so many child processes? You have 40 in Bayes alone and in a previous post you had "--round-robin" with "--max-children=180", i.e. a fixed number of 180 in total.
RE: Error "cannot open bayes databases" lock failed: File exists
Hello, -rw--- 1 spamd spamd 224 ene 20 13:45 bayes.lock -rw--- 1 spamd spamd 84 ene 2 01:31 bayes.lock.eternia6.dattaweb.com.11016 -rw--- 1 spamd spamd 224 ene 2 01:31 bayes.lock.eternia6.dattaweb.com.11251 -rw--- 1 spamd spamd 84 ene 2 01:31 bayes.lock.eternia6.dattaweb.com.14855 -rw--- 1 spamd spamd 224 ene 2 01:31 bayes.lock.eternia6.dattaweb.com.16779 -rw--- 1 spamd spamd 224 ene 5 01:37 bayes.lock.eternia6.dattaweb.com.25210 -rw--- 1 spamd spamd 168 ene 20 11:29 bayes.lock.eternia6.dattaweb.com.25620 -rw--- 1 spamd spamd 28 ene 5 01:37 bayes.lock.eternia6.dattaweb.com.25694 -rw--- 1 spamd spamd 28 ene 20 13:54 bayes.lock.eternia6.dattaweb.com.29848 -rw--- 1 spamd spamd 112 ene 20 13:54 bayes.lock.eternia6.dattaweb.com.29852 -rw--- 1 spamd spamd 28 ene 20 13:54 bayes.lock.eternia6.dattaweb.com.29868 -rw--- 1 spamd spamd 224 ene 20 13:54 bayes.lock.eternia6.dattaweb.com.29873 -rw--- 1 spamd spamd 54 ene 15 17:47 bayes.lock.eternia6.dattaweb.com.3018 -rw--- 1 spamd spamd 252 ene 19 11:22 bayes.lock.eternia6.dattaweb.com.30473 -rw--- 1 spamd spamd 252 ene 20 13:54 bayes.lock.eternia6.dattaweb.com.31005 -rw--- 1 spamd spamd 252 ene 20 13:54 bayes.lock.eternia6.dattaweb.com.31007 -rw--- 1 spamd spamd 224 ene 20 13:54 bayes.lock.eternia6.dattaweb.com.31009 -rw--- 1 spamd spamd 112 ene 20 13:54 bayes.lock.eternia6.dattaweb.com.31092 -rw--- 1 spamd spamd 112 ene 20 13:54 bayes.lock.eternia6.dattaweb.com.31095 -rw--- 1 spamd spamd 196 ene 20 13:54 bayes.lock.eternia6.dattaweb.com.31101 -rw--- 1 spamd spamd 196 ene 20 13:54 bayes.lock.eternia6.dattaweb.com.31149 -rw--- 1 spamd spamd 112 ene 20 13:54 bayes.lock.eternia6.dattaweb.com.31160 -rw--- 1 spamd spamd 252 ene 20 13:54 bayes.lock.eternia6.dattaweb.com.31274 -rw--- 1 spamd spamd 140 ene 20 13:54 bayes.lock.eternia6.dattaweb.com.31687 -rw--- 1 spamd spamd 168 ene 20 13:54 bayes.lock.eternia6.dattaweb.com.31733 -rw--- 1 spamd spamd 56 ene 20 13:54 bayes.lock.eternia6.dattaweb.com.31836 -rw--- 1 spamd spamd 270 ene 18 10:11 bayes.lock.eternia6.dattaweb.com.5412 -rw--- 1 spamd spamd 54 ene 18 10:11 bayes.lock.eternia6.dattaweb.com.5429 -rw--- 1 spamd spamd 216 ene 18 10:11 bayes.lock.eternia6.dattaweb.com.5436 -rw--- 1 spamd spamd 108 ene 18 10:11 bayes.lock.eternia6.dattaweb.com.5443 -rw--- 1 spamd spamd 270 ene 18 10:11 bayes.lock.eternia6.dattaweb.com.5455 -rw--- 1 spamd spamd 243 ene 18 10:11 bayes.lock.eternia6.dattaweb.com.5493 -rw--- 1 spamd spamd 135 ene 18 10:11 bayes.lock.eternia6.dattaweb.com.5496 -rw--- 1 spamd spamd 270 ene 18 10:11 bayes.lock.eternia6.dattaweb.com.5524 -rw--- 1 spamd spamd 189 ene 18 10:11 bayes.lock.eternia6.dattaweb.com.5527 -rw--- 1 spamd spamd 108 ene 18 10:11 bayes.lock.eternia6.dattaweb.com.5529 -rw--- 1 spamd spamd 81 ene 18 10:11 bayes.lock.eternia6.dattaweb.com.5540 -rw--- 1 spamd spamd 243 ene 18 10:11 bayes.lock.eternia6.dattaweb.com.5549 -rw--- 1 spamd spamd 270 ene 18 10:11 bayes.lock.eternia6.dattaweb.com.5557 -rw--- 1 spamd spamd 162 ene 18 10:11 bayes.lock.eternia6.dattaweb.com.5574 -rw--- 1 spamd spamd 81 ene 18 10:11 bayes.lock.eternia6.dattaweb.com.5579 -rw--- 1 spamd spamd 108 ene 18 10:11 bayes.lock.eternia6.dattaweb.com.5582 -rw--- 1 spamd spamd 216 ene 2 01:31 bayes.lock.eternia6.dattaweb.com.9227 -rw--- 1 spamd spamd 720192 ene 20 13:54 bayes_journal -rwxr-xr-x 1 spamd spamd 172032 dic 18 10:52 bayes_seen -rwxr-xr-x 1 spamd spamd 5132288 ene 20 13:45 bayes_toks De: Dave Funk Enviado: miércoles, 20 de enero de 2021 13:39 Para: users@spamassassin.apache.org Asunto: Re: Error "cannot open bayes databases" lock failed: File exists On Wed, 20 Jan 2021, Matus UHLAR - fantomas wrote: > On 20.01.21 11:07, Emanuel Gonzalez wrote: >> Date: Wed, 20 Jan 2021 11:07:59 + >> From: Emanuel Gonzalez >> To: SA Mailing list >> Subject: Re: Error "cannot open bayes databases" lock failed: File exists >> >> Hello everyone, i'm back from my vacations, i try solved this problem but i >> could not. >> >> I still see in the spamsassin error logs the mentioned error: >> >> bayes_learn_to_journal 1 >> use_bayes yes >> bayes_path /var/spamassassin/bayesdb/bayes >> bayes_auto_learn 0 >> bayes_auto_expire 0 >> > > try: > > ls -la /var/spamassassin/bayesdb/bayes > lsof /var/spamassassin/bayesdb/bayes_journal > /var/spamassassin/bayesdb/bayes_seen /var/spamassassin/bayesdb/bayes_toks Umm, the command: ls -la /var/spamassassin/bayesdb/bay
Re: Error "cannot open bayes databases" lock failed: File exists
On Wed, 20 Jan 2021, Matus UHLAR - fantomas wrote: On 20.01.21 11:07, Emanuel Gonzalez wrote: Date: Wed, 20 Jan 2021 11:07:59 + From: Emanuel Gonzalez To: SA Mailing list Subject: Re: Error "cannot open bayes databases" lock failed: File exists Hello everyone, i'm back from my vacations, i try solved this problem but i could not. I still see in the spamsassin error logs the mentioned error: bayes_learn_to_journal 1 use_bayes yes bayes_path /var/spamassassin/bayesdb/bayes bayes_auto_learn 0 bayes_auto_expire 0 try: ls -la /var/spamassassin/bayesdb/bayes lsof /var/spamassassin/bayesdb/bayes_journal /var/spamassassin/bayesdb/bayes_seen /var/spamassassin/bayesdb/bayes_toks Umm, the command: ls -la /var/spamassassin/bayesdb/bayes should get you the error: ls: cannot access /var/spamassassin/bayesdb/bayes : No such file or directory On the otherhand: ls -la /var/spamassassin/bayesdb/bayes* (taken from the bayes_path parameter) should get you what you want. even better: ls -la /var/spamassassin/bayesdb/ (to see if there's any leftover lock files in that directory) -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-05491256 Seamans Center, 103 S Capitol St. Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
RE: Error "cannot open bayes databases" lock failed: File exists
Hello Matus, thanks for your reply. # ls -la /var/spamassassin/bayesdb/bayes ls: no se puede acceder a /var/spamassassin/bayesdb/bayes: No existe el fichero o el directorio I see an error of inexistent file. # lsof /var/spamassassin/bayesdb/bayes_journal /var/spamassassin/bayesdb/bayes_seen /var/spamassassin/bayesdb/bayes_toks COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME spamd 25467 spamd 12r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25467 spamd 13r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25470 spamd 15r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25470 spamd 16r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25491 spamd 36r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25491 spamd 37r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25494 spamd 39r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25494 spamd 40r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25502 spamd 47r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25502 spamd 48r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25503 spamd 48r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25503 spamd 49r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25504 spamd 51r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25504 spamd 52r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25506 spamd 51r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25506 spamd 52r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25514 spamd 59r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25514 spamd 60r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25515 spamd 60r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25515 spamd 70r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25520 spamd 68r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25520 spamd 69r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25536 spamd 81r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25536 spamd 82r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25537 spamd 84r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25537 spamd 85r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25542 spamd 87r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25542 spamd 88r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25544 spamd 90r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25544 spamd 91r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25546 spamd 91r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25546 spamd 92r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25552 spamd 97r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25552 spamd 98r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25561 spamd 106r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25561 spamd 107r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25568 spamd 113r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25568 spamd 114r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25573 spamd 118r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25573 spamd 119r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25574 spamd 119r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25574 spamd 120r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25586 spamd 131r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25586 spamd 132r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25588 spamd 133r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25588 spamd 134r REG8,1 172032 402828743 /var/spamassassin/bayesdb/bayes_seen spamd 25592 spamd 137r REG8,1 5132288 402667308 /var/spamassassin/bayesdb/bayes_toks spamd 25592 spamd 138r REG8,1 172032 402828743 /var/spamassassin/bayesdb
Re: Error "cannot open bayes databases" lock failed: File exists
On 20.01.21 11:07, Emanuel Gonzalez wrote: Date: Wed, 20 Jan 2021 11:07:59 + From: Emanuel Gonzalez To: SA Mailing list Subject: Re: Error "cannot open bayes databases" lock failed: File exists Hello everyone, i'm back from my vacations, i try solved this problem but i could not. I still see in the spamsassin error logs the mentioned error: bayes_learn_to_journal 1 use_bayes yes bayes_path /var/spamassassin/bayesdb/bayes bayes_auto_learn 0 bayes_auto_expire 0 try: ls -la /var/spamassassin/bayesdb/bayes lsof /var/spamassassin/bayesdb/bayes_journal /var/spamassassin/bayesdb/bayes_seen /var/spamassassin/bayesdb/bayes_toks - rw--- 1 spamd spamd 48984 ene 20 08:06 /var/spamassassin/bayesdb/bayes_journal -rwxr-xr-x 1 spamd spamd 172032 dic 18 10:52 /var/spamassassin/bayesdb/bayes_seen -rwxr-xr-x 1 spamd spamd 5132288 ene 20 08:05 /var/spamassassin/bayesdb/bayes_toks Jan 20 07:25:27 eternia6 spamd[22817]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Jan 20 07:25:27 eternia6 spamd[22916]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Jan 20 07:25:27 eternia6 spamd[22843]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Any ideas? i don't know how resolve this error. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Atheism is a non-prophet organization.
Re: Error "cannot open bayes databases" lock failed: File exists
Hello everyone, i'm back from my vacations, i try solved this problem but i could not. I still see in the spamsassin error logs the mentioned error: bayes_learn_to_journal 1 use_bayes yes bayes_path /var/spamassassin/bayesdb/bayes bayes_auto_learn 0 bayes_auto_expire 0 # - rw--- 1 spamd spamd 48984 ene 20 08:06 /var/spamassassin/bayesdb/bayes_journal -rwxr-xr-x 1 spamd spamd 172032 dic 18 10:52 /var/spamassassin/bayesdb/bayes_seen -rwxr-xr-x 1 spamd spamd 5132288 ene 20 08:05 /var/spamassassin/bayesdb/bayes_toks Jan 20 07:25:27 eternia6 spamd[22817]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Jan 20 07:25:27 eternia6 spamd[22916]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Jan 20 07:25:27 eternia6 spamd[22843]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Any ideas? i don't know how resolve this error. Regards, Emanuel.
Re: Error "cannot open bayes databases" lock failed: File exists
Emanuel Gonzalez wrote: # SpamAssassin Deamon config SPAMDOPTIONS="-u spamd --round-robin --min-children=30 --max-children=180 --min-spare=25 --max-spare=80 --timeout-child=60 --max-conn-per-child=150 -i -A 172.17.0.0/16,10.0.0.0/8,200.58.96.0/19,179.43.112.0/20,168.197.48.0/22,168.181.184.0/22,138.219.40.0/22,138.36.236.0/22,66.97.32.0/20" Putting aside your Bayes error (which I'm pretty sure Matus answered), this seems like an awful lot of individual systems allowed to connect to a single spamd instance - it's not generally an end-user-accessible service. Do you really need to access this spamd instance from ~20,000 public IPs? -kgd
Re: Error "cannot open bayes databases" lock failed: File exists
On 30.12.20 13:53, Emanuel Gonzalez wrote: Dec 30 09:56:57 eternia6 spamd[15993]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Dec 30 09:56:57 eternia6 spamd[15915]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Dec 30 09:56:58 eternia6 spamd[16002]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Dec 30 09:56:59 eternia6 spamd[15960]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Dec 30 09:57:00 eternia6 spamd[15847]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Dec 30 09:57:01 eternia6 spamd[15909]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists is possible be an error of permission? aparently no. That's apparently problem of a process having the database locked while other process tries to write to it. drwsr-sr-x 3 spamd spamd 20 dic 18 10:26 /var/spamassassin drwxr-xr-x 2 spamd spamd 60 dic 30 10:03 /var/spamassassin/bayesdb/ -rw--- 1 spamd spamd 66960 dic 30 10:03 bayes_journal -rwxr-xr-x 1 spamd spamd 172032 dic 18 10:52 bayes_seen -rwxr-xr-x 1 spamd spamd 5132288 dic 30 10:03 bayes_toks # Bayes config use_bayes yes bayes_path /var/spamassassin/bayesdb/bayes bayes_auto_learn 0 bayes_auto_expire 0 # SpamAssassin Deamon config SPAMDOPTIONS="-u spamd --round-robin --min-children=30 --max-children=180 --min-spare=25 --max-spare=80 --timeout-child=60 --max-conn-per-child=150 -i -A 172.17.0.0/16,10.0.0.0/8,200.58.96.0/19,179.43.112.0/20,168.197.48.0/22,168.181.184.0/22,138.219.40.0/22,138.36.236.0/22,66.97.32.0/20" I read various publications for this error but i don't know how resolve it. Any ideas, recommendations? bayes_learn_to_journal 1 -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. On the other hand, you have different fingers.
Error "cannot open bayes databases" lock failed: File exists
Good Morning everyone, In the logs of spamassassin i see this error: Dec 30 09:56:57 eternia6 spamd[15993]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Dec 30 09:56:57 eternia6 spamd[15915]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Dec 30 09:56:58 eternia6 spamd[16002]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Dec 30 09:56:59 eternia6 spamd[15960]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Dec 30 09:57:00 eternia6 spamd[15847]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists Dec 30 09:57:01 eternia6 spamd[15909]: bayes: cannot open bayes databases /var/spamassassin/bayesdb/bayes_* R/W: lock failed: File exists is possible be an error of permission? drwsr-sr-x 3 spamd spamd 20 dic 18 10:26 /var/spamassassin drwxr-xr-x 2 spamd spamd 60 dic 30 10:03 /var/spamassassin/bayesdb/ -rw--- 1 spamd spamd 66960 dic 30 10:03 bayes_journal -rwxr-xr-x 1 spamd spamd 172032 dic 18 10:52 bayes_seen -rwxr-xr-x 1 spamd spamd 5132288 dic 30 10:03 bayes_toks # Bayes config use_bayes yes bayes_path /var/spamassassin/bayesdb/bayes bayes_auto_learn 0 bayes_auto_expire 0 # SpamAssassin Deamon config SPAMDOPTIONS="-u spamd --round-robin --min-children=30 --max-children=180 --min-spare=25 --max-spare=80 --timeout-child=60 --max-conn-per-child=150 -i -A 172.17.0.0/16,10.0.0.0/8,200.58.96.0/19,179.43.112.0/20,168.197.48.0/22,168.181.184.0/22,138.219.40.0/22,138.36.236.0/22,66.97.32.0/20" I read various publications for this error but i don't know how resolve it. Any ideas, recommendations? Regards, Emanuel.