Re: Beginner Setting up Spam Assassin

2023-12-29 Thread Jimmy
You can create rule something like this

header  BLOCK_EMAIL   From:addr =~ /user\@domain\.com/
describe BLOCK_EMAIL   Block email
scoreBLOCK_EMAIL5.00

On Sat, Dec 30, 2023 at 10:08 AM FalconChristopher <
falconchristop...@bell.net> wrote:

> Anyone know how I can check and setup SpamAssassin so that I can
> eliminate some spam from coming in from a email account ?
>
>
> On 12/28/2023 2:24 AM, Matus UHLAR - fantomas wrote:
> > On 27.12.23 16:53, FalconChristopher wrote:
> >> Hi, I want to setup Spam Assassin so that any email that Spam
> >> Assassin flags as spam
> >
> > this is spamassassin's job
> >
> >> gets placed into a folder for a specific SMTP or IMAP email account.
> >
> > this is not spamassassin's job.
> > It's job of mail delivery agent - procmail, maildrop, sieve
> >
> >> Then if Spam Assassin flags emails that are not spam I can tell it
> >> which of those emails to not place into the spam folder for the
> >> specific email client. Until it gradually learns which emails are
> >> spam and which are not.
> >
> > dovecot (imap/pop3 server) has plugins that support training of
> > spam/ham, if you move the mail from/to spam folder.
> >
> > https://doc.dovecot.org/configuration_manual/spam_reporting/
> >
> >> I've done a little research and I have access with my distribution to
> >> a mail directory as well as the local.cf file for which
> >> configurations are for Spam Assassin but I don't know how to setup
> >> what I mentioned above ?
> >
>


Re: Beginner Setting up Spam Assassin

2023-12-29 Thread FalconChristopher
Anyone know how I can check and setup SpamAssassin so that I can 
eliminate some spam from coming in from a email account ?



On 12/28/2023 2:24 AM, Matus UHLAR - fantomas wrote:

On 27.12.23 16:53, FalconChristopher wrote:
Hi, I want to setup Spam Assassin so that any email that Spam 
Assassin flags as spam


this is spamassassin's job


gets placed into a folder for a specific SMTP or IMAP email account.


this is not spamassassin's job.
It's job of mail delivery agent - procmail, maildrop, sieve

Then if Spam Assassin flags emails that are not spam I can tell it 
which of those emails to not place into the spam folder for the 
specific email client. Until it gradually learns which emails are 
spam and which are not.


dovecot (imap/pop3 server) has plugins that support training of 
spam/ham, if you move the mail from/to spam folder.


https://doc.dovecot.org/configuration_manual/spam_reporting/

I've done a little research and I have access with my distribution to 
a mail directory as well as the local.cf file for which 
configurations are for Spam Assassin but I don't know how to setup 
what I mentioned above ?




Re: Spreadsheet::Excel ?

2023-12-29 Thread Bill Cole

On 2023-12-29 at 08:41:23 UTC-0500 (Fri, 29 Dec 2023 08:41:23 -0500)
Alex 
is rumored to have said:


Hi,

Barracuda recently announced they've identified a vulnerability in the
Spreadsheet::Excel library used by amavis in their appliances. I 
didn't

realize they were still using amavis and open source (and presumably
spamassassin?).
https://www.barracuda.com/company/legal/esg-vulnerability


Barracuda has never been entirely open about their components, but they 
started as a very typical Postfix/Amavis/SpamAssassin/ClamAV rig.


I don't have this library on my system - is there a plugin that 
enables

parsing of Excel spreadsheets for malicious code?


The OLEVBMacro plugin exists. It does not use Spreadsheet::Excel. Malice 
is out of scope, but since mailing around MS files with macros has never 
been a good idea, discriminating between malice or sheer blinding 
stupidity is non-critical.


In my experience it has been workable to just reject mail with .xls and 
.xlsx attachments by default at any Internet-facing MX. 20+ years of 
warnings about how reckless it is to share MS documents ought to suffice 
for anyone.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Spreadsheet::Excel ?

2023-12-29 Thread Benny Pedersen

Alex skrev den 2023-12-29 14:41:

Hi,

Barracuda recently announced they've identified a vulnerability in the
Spreadsheet::Excel library used by amavis in their appliances. I
didn't realize they were still using amavis and open source (and
presumably spamassassin?).
https://www.barracuda.com/company/legal/esg-vulnerability


this link provide Yara rules, that can be used in clamav database dir


I don't have this library on my system - is there a plugin that
enables parsing of Excel spreadsheets for malicious code? I realize
there is the ExtractText plugin, and although it doesn't actually work
to identify any potentially malicious code within an Excel file, it
does look to be much more comprehensive and capable.

https://www.techtarget.com/searchsecurity/news/366564654/Another-Barracuda-ESG-zero-day-flaw-exploited-in-the-wild


amavisd can block xls files, if not wanted

more long term solve is to add malware to clamav if possible, sadly not 
easy :/


test malware on virustotal.com and hope av wonders add it to there 
databases of malware, sadly clamav dont get it :/




Spreadsheet::Excel ?

2023-12-29 Thread Alex
Hi,

Barracuda recently announced they've identified a vulnerability in the
Spreadsheet::Excel library used by amavis in their appliances. I didn't
realize they were still using amavis and open source (and presumably
spamassassin?).
https://www.barracuda.com/company/legal/esg-vulnerability

I don't have this library on my system - is there a plugin that enables
parsing of Excel spreadsheets for malicious code? I realize there is the
ExtractText plugin, and although it doesn't actually work to identify any
potentially malicious code within an Excel file, it does look to be much
more comprehensive and capable.

https://www.techtarget.com/searchsecurity/news/366564654/Another-Barracuda-ESG-zero-day-flaw-exploited-in-the-wild


Re: Bayes Stopword

2023-12-29 Thread Jimmy
This is what I believe: the words need to be trimmed or separated, and
careful consideration is required to determine the language in order to
perform accurate cutoffs.

Jimmy

On Fri, Dec 29, 2023 at 5:16 PM  wrote:

> "ทุก" is not considered a word because it's part of the token
> "ทุกวันพุธเล่นชนะรับเพิ่ม".
> Words must be separated by spaces, otherwise we should skip the word
> "theme" just because "the" is in english stopword list.
> No idea if this makes sense for asian languages.
>
>   Giovanni
>
> On 12/29/23 11:04, Jimmy wrote:
> >
> > The sample email and word list should contain at least these words.
> >
> > ถูก
> > เลย
> > ทุก
> >
> > Jimmy
> >
> > On Fri, Dec 29, 2023 at 4:47 PM  giova...@paclan.it>> wrote:
> >
> > I do not speak Thai but I cannot see any word in the sample email
> that should match that list.
> > Which word do you think should match the regexp ?
> >Giovanni
> >
> > On 12/29/23 10:08, Jimmy wrote:
> >  > You can use this word list
> >  >
> >  >
> https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt
> <
> https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt>
> <
> https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt
> <
> https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt
> >>
> >  >
> >  > Jimmy
> >  >
> >  > On Fri, Dec 29, 2023 at 3:59 PM  giova...@paclan.it> >>
> wrote:
> >  >
> >  > To create the stopwords regexp I used the script I shared in
> a previous email and a list of words one per line.
> >  > Could you share the list you are using ?
> >  >
> >  > Giovanni
> >  >
> >  > On 12/29/23 09:22, Jimmy wrote:
> >  >  > I use SpamAssassin 4.0.0 (2022-12-14)
> >  >  >
> >  >  > $ spamassassin -D --lint 2>&1 | grep bayes:
> >  >  > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found
> lang=en
> >  >  > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found
> lang=th
> >  >  > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found
> lang=ru
> >  >  > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found
> lang=fr
> >  >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found
> lang=ja
> >  >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found
> lang=zh
> >  >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found
> lang=dk
> >  >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found
> lang=nl
> >  >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found
> lang=de
> >  >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found
> lang=es
> >  >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found
> lang=fi
> >  >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found
> lang=fr
> >  >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found
> lang=it
> >  >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found
> lang=no
> >  >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found
> lang=ru
> >  >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found
> lang=se
> >  >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found
> lang=tr
> >  >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found
> lang=vi
> >  >  > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found
> lang=ko
> >  >  > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found
> lang=zh
> >  >  > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found
> lang=hi
> >  >  > Dec 29 15:17:58.019 [17420] dbg: bayes: stopwords for
> languages enabled: en th ru fr ja zh dk nl de es fi fr it no ru se tr vi ko
> zh hi
> >  >  >
> >  >  >
> >  >  > $ spamassassin -D bayes,learn < test.msg 2>&1 | grep
> "skipped token"
> >  >  > Dec 29 15:16:57.585 [17347] dbg: bayes: skipped token
> 'Email' because it's in stopword list for language 'en'
> >  >  >
> >  >  > You can use "บาท" that was listed in regexp pattern but
> somehow I don't know why it not show skipped token in bayes.
> >  >  >
> >  >  > Jimmy
> >  >  >
> >  >  >
> >  >  > On Fri, Dec 29, 2023 at 2:59 PM   > 
>  >  >  >
> >  >  > Config line produces a syntax error for me:
> >  >  > config: failed to parse line in /etc/mail/spamassassin/
> local.cf  > <
> http://local.cf  >>
> (line 1): bayes_stopword_th
> >  >  >
> >  >  > 

Re: Bayes Stopword

2023-12-29 Thread giovanni

"ทุก" is not considered a word because it's part of the token 
"ทุกวันพุธเล่นชนะรับเพิ่ม".
Words must be separated by spaces, otherwise we should skip the word "theme" just because 
"the" is in english stopword list.
No idea if this makes sense for asian languages.

 Giovanni

On 12/29/23 11:04, Jimmy wrote:


The sample email and word list should contain at least these words.

ถูก
เลย
ทุก

Jimmy

On Fri, Dec 29, 2023 at 4:47 PM mailto:giova...@paclan.it>> wrote:

I do not speak Thai but I cannot see any word in the sample email that 
should match that list.
Which word do you think should match the regexp ?
   Giovanni

On 12/29/23 10:08, Jimmy wrote:
 > You can use this word list
 >
 > https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt 
 
>
 >
 > Jimmy
 >
 > On Fri, Dec 29, 2023 at 3:59 PM mailto:giova...@paclan.it> 
>> wrote:
 >
 >     To create the stopwords regexp I used the script I shared in a 
previous email and a list of words one per line.
 >     Could you share the list you are using ?
 >
 >         Giovanni
 >
 >     On 12/29/23 09:22, Jimmy wrote:
 >      > I use SpamAssassin 4.0.0 (2022-12-14)
 >      >
 >      > $ spamassassin -D --lint 2>&1 | grep bayes:
 >      > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=en
 >      > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=th
 >      > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=ru
 >      > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=fr
 >      > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ja
 >      > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=zh
 >      > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=dk
 >      > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=nl
 >      > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=de
 >      > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=es
 >      > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fi
 >      > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fr
 >      > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=it
 >      > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=no
 >      > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ru
 >      > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=se
 >      > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=tr
 >      > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=vi
 >      > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=ko
 >      > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=zh
 >      > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=hi
 >      > Dec 29 15:17:58.019 [17420] dbg: bayes: stopwords for languages 
enabled: en th ru fr ja zh dk nl de es fi fr it no ru se tr vi ko zh hi
 >      >
 >      >
 >      > $ spamassassin -D bayes,learn < test.msg 2>&1 | grep "skipped 
token"
 >      > Dec 29 15:16:57.585 [17347] dbg: bayes: skipped token 'Email' 
because it's in stopword list for language 'en'
 >      >
 >      > You can use "บาท" that was listed in regexp pattern but somehow I 
don't know why it not show skipped token in bayes.
 >      >
 >      > Jimmy
 >      >
 >      >
 >      > On Fri, Dec 29, 2023 at 2:59 PM mailto:giova...@paclan.it> 
>  
      >
 >      >     Config line produces a syntax error for me:
 >      >     config: failed to parse line in /etc/mail/spamassassin/local.cf  
>  >> (line 1): bayes_stopword_th
 >      >
 >      >     Could you share the word list in utf8 ?
 >      >     I tried adding "บาท" to 
https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt 
 
> 


Re: Bayes Stopword

2023-12-29 Thread Jimmy
The sample email and word list should contain at least these words.

ถูก
เลย
ทุก

Jimmy

On Fri, Dec 29, 2023 at 4:47 PM  wrote:

> I do not speak Thai but I cannot see any word in the sample email that
> should match that list.
> Which word do you think should match the regexp ?
>   Giovanni
>
> On 12/29/23 10:08, Jimmy wrote:
> > You can use this word list
> >
> >
> https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt
> <
> https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt
> >
> >
> > Jimmy
> >
> > On Fri, Dec 29, 2023 at 3:59 PM  giova...@paclan.it>> wrote:
> >
> > To create the stopwords regexp I used the script I shared in a
> previous email and a list of words one per line.
> > Could you share the list you are using ?
> >
> > Giovanni
> >
> > On 12/29/23 09:22, Jimmy wrote:
> >  > I use SpamAssassin 4.0.0 (2022-12-14)
> >  >
> >  > $ spamassassin -D --lint 2>&1 | grep bayes:
> >  > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=en
> >  > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=th
> >  > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=ru
> >  > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=fr
> >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ja
> >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=zh
> >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=dk
> >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=nl
> >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=de
> >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=es
> >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fi
> >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fr
> >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=it
> >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=no
> >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ru
> >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=se
> >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=tr
> >  > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=vi
> >  > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=ko
> >  > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=zh
> >  > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=hi
> >  > Dec 29 15:17:58.019 [17420] dbg: bayes: stopwords for languages
> enabled: en th ru fr ja zh dk nl de es fi fr it no ru se tr vi ko zh hi
> >  >
> >  >
> >  > $ spamassassin -D bayes,learn < test.msg 2>&1 | grep "skipped
> token"
> >  > Dec 29 15:16:57.585 [17347] dbg: bayes: skipped token 'Email'
> because it's in stopword list for language 'en'
> >  >
> >  > You can use "บาท" that was listed in regexp pattern but somehow I
> don't know why it not show skipped token in bayes.
> >  >
> >  > Jimmy
> >  >
> >  >
> >  > On Fri, Dec 29, 2023 at 2:59 PM  giova...@paclan.it> >>
> wrote:
> >  >
> >  > Config line produces a syntax error for me:
> >  > config: failed to parse line in /etc/mail/spamassassin/
> local.cf  > (line 1):
> bayes_stopword_th
> >  >
> >  > Could you share the word list in utf8 ?
> >  > I tried adding "บาท" to
> https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt
> <
> https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt>
> <
> https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt
> <
> https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt>>
> and it produces a working regexp.
> >  > Bayes stopwords languages must also be enabled using
> "bayes_stopword_languages" config keyword, by default only english is
> enabled.
> >  >Giovanni
> >  >
> >  > On 12/28/23 17:06, Jimmy wrote:
> >  >  > bayes_stopword_th https://pastebin.pl/view/0838138d <
> https://pastebin.pl/view/0838138d>  https://pastebin.pl/view/0838138d>>  https://pastebin.pl/view/0838138d>  https://pastebin.pl/view/0838138d>>>
> >  >  > Sample mail https://pastebin.pl/view/e5a2c5b8 <
> https://pastebin.pl/view/e5a2c5b8>  https://pastebin.pl/view/e5a2c5b8>>  https://pastebin.pl/view/e5a2c5b8>  https://pastebin.pl/view/e5a2c5b8>>>
> >  >  >
> >  >  > Jimmy
> >  >  >
> >  >  >
> >  >  > On Thu, Dec 28, 2023 

Re: Bayes Stopword

2023-12-29 Thread giovanni

I do not speak Thai but I cannot see any word in the sample email that should 
match that list.
Which word do you think should match the regexp ?
 Giovanni

On 12/29/23 10:08, Jimmy wrote:

You can use this word list

https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt 


Jimmy

On Fri, Dec 29, 2023 at 3:59 PM mailto:giova...@paclan.it>> wrote:

To create the stopwords regexp I used the script I shared in a previous 
email and a list of words one per line.
Could you share the list you are using ?

    Giovanni

On 12/29/23 09:22, Jimmy wrote:
 > I use SpamAssassin 4.0.0 (2022-12-14)
 >
 > $ spamassassin -D --lint 2>&1 | grep bayes:
 > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=en
 > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=th
 > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=ru
 > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=fr
 > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ja
 > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=zh
 > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=dk
 > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=nl
 > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=de
 > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=es
 > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fi
 > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fr
 > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=it
 > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=no
 > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ru
 > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=se
 > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=tr
 > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=vi
 > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=ko
 > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=zh
 > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=hi
 > Dec 29 15:17:58.019 [17420] dbg: bayes: stopwords for languages enabled: 
en th ru fr ja zh dk nl de es fi fr it no ru se tr vi ko zh hi
 >
 >
 > $ spamassassin -D bayes,learn < test.msg 2>&1 | grep "skipped token"
 > Dec 29 15:16:57.585 [17347] dbg: bayes: skipped token 'Email' because 
it's in stopword list for language 'en'
 >
 > You can use "บาท" that was listed in regexp pattern but somehow I don't 
know why it not show skipped token in bayes.
 >
 > Jimmy
 >
 >
 > On Fri, Dec 29, 2023 at 2:59 PM mailto:giova...@paclan.it> 
>> wrote:
 >
 >     Config line produces a syntax error for me:
 >     config: failed to parse line in /etc/mail/spamassassin/local.cf 
 > (line 1): bayes_stopword_th
 >
 >     Could you share the word list in utf8 ?
 >     I tried adding "บาท" to 
https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt 
 
> and it 
produces a working regexp.
 >     Bayes stopwords languages must also be enabled using 
"bayes_stopword_languages" config keyword, by default only english is enabled.
 >        Giovanni
 >
 >     On 12/28/23 17:06, Jimmy wrote:
 >      > bayes_stopword_th https://pastebin.pl/view/0838138d  
>  >>
 >      > Sample mail https://pastebin.pl/view/e5a2c5b8  
>  >>
 >      >
 >      > Jimmy
 >      >
 >      >
 >      > On Thu, Dec 28, 2023 at 10:59 PM mailto:giova...@paclan.it> 
>  
      >
 >      >     Could you share a config line and a sample you are using ?
 >      >        Giovanni
 >      >
 >      >     On 12/28/23 16:26, Jimmy wrote:
 >      >      > Yes, I have done 

Re: Bayes Stopword

2023-12-29 Thread Jimmy
You can use this word list

https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt

Jimmy

On Fri, Dec 29, 2023 at 3:59 PM  wrote:

> To create the stopwords regexp I used the script I shared in a previous
> email and a list of words one per line.
> Could you share the list you are using ?
>
>Giovanni
>
> On 12/29/23 09:22, Jimmy wrote:
> > I use SpamAssassin 4.0.0 (2022-12-14)
> >
> > $ spamassassin -D --lint 2>&1 | grep bayes:
> > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=en
> > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=th
> > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=ru
> > Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=fr
> > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ja
> > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=zh
> > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=dk
> > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=nl
> > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=de
> > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=es
> > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fi
> > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fr
> > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=it
> > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=no
> > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ru
> > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=se
> > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=tr
> > Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=vi
> > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=ko
> > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=zh
> > Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=hi
> > Dec 29 15:17:58.019 [17420] dbg: bayes: stopwords for languages enabled:
> en th ru fr ja zh dk nl de es fi fr it no ru se tr vi ko zh hi
> >
> >
> > $ spamassassin -D bayes,learn < test.msg 2>&1 | grep "skipped token"
> > Dec 29 15:16:57.585 [17347] dbg: bayes: skipped token 'Email' because
> it's in stopword list for language 'en'
> >
> > You can use "บาท" that was listed in regexp pattern but somehow I don't
> know why it not show skipped token in bayes.
> >
> > Jimmy
> >
> >
> > On Fri, Dec 29, 2023 at 2:59 PM  giova...@paclan.it>> wrote:
> >
> > Config line produces a syntax error for me:
> > config: failed to parse line in /etc/mail/spamassassin/local.cf <
> http://local.cf> (line 1): bayes_stopword_th
> >
> > Could you share the word list in utf8 ?
> > I tried adding "บาท" to
> https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt
> <
> https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt>
> and it produces a working regexp.
> > Bayes stopwords languages must also be enabled using
> "bayes_stopword_languages" config keyword, by default only english is
> enabled.
> >Giovanni
> >
> > On 12/28/23 17:06, Jimmy wrote:
> >  > bayes_stopword_th https://pastebin.pl/view/0838138d <
> https://pastebin.pl/view/0838138d>  https://pastebin.pl/view/0838138d>>
> >  > Sample mail https://pastebin.pl/view/e5a2c5b8 <
> https://pastebin.pl/view/e5a2c5b8>  https://pastebin.pl/view/e5a2c5b8>>
> >  >
> >  > Jimmy
> >  >
> >  >
> >  > On Thu, Dec 28, 2023 at 10:59 PM  giova...@paclan.it> >>
> wrote:
> >  >
> >  > Could you share a config line and a sample you are using ?
> >  >Giovanni
> >  >
> >  > On 12/28/23 16:26, Jimmy wrote:
> >  >  > Yes, I have done that, and I am also editing
> Plugin/Bayes.pm to investigate why it is not being skipped. I suspect that
> if words are not separated by spaces, longer words may not match those
> patterns.
> >  >  >
> >  >  > Jimmy
> >  >  >
> >  >  > On Thu, Dec 28, 2023 at 10:13 PM   > 
>  >  >  >
> >  >  > "spamassassin -D bayes" will tell you, you should see
> a line like:
> >  >  > bayes: skipped token 'from' because it's in stopword
> list for language 'en'
> >  >  >
> >  >  >Giovanni
> >  >  >
> >  >  > On 12/28/23 15:45, Jimmy wrote:
> >  >  >  > The pattern has successfully passed the test
> script, but it needs to check whether Bayes learning will identify and
> possibly exclude the word from matching this pattern.
> >  >  >  >
> >  >  >  > Thank you.
> >  >  >  >
> >  >  >  >
> >  

Re: Bayes Stopword

2023-12-29 Thread giovanni

To create the stopwords regexp I used the script I shared in a previous email 
and a list of words one per line.
Could you share the list you are using ?

  Giovanni

On 12/29/23 09:22, Jimmy wrote:

I use SpamAssassin 4.0.0 (2022-12-14)

$ spamassassin -D --lint 2>&1 | grep bayes:
Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=en
Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=th
Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=ru
Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=fr
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ja
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=zh
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=dk
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=nl
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=de
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=es
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fi
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fr
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=it
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=no
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ru
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=se
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=tr
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=vi
Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=ko
Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=zh
Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=hi
Dec 29 15:17:58.019 [17420] dbg: bayes: stopwords for languages enabled: en th 
ru fr ja zh dk nl de es fi fr it no ru se tr vi ko zh hi


$ spamassassin -D bayes,learn < test.msg 2>&1 | grep "skipped token"
Dec 29 15:16:57.585 [17347] dbg: bayes: skipped token 'Email' because it's in 
stopword list for language 'en'

You can use "บาท" that was listed in regexp pattern but somehow I don't know 
why it not show skipped token in bayes.

Jimmy


On Fri, Dec 29, 2023 at 2:59 PM mailto:giova...@paclan.it>> wrote:

Config line produces a syntax error for me:
config: failed to parse line in /etc/mail/spamassassin/local.cf 
 (line 1): bayes_stopword_th

Could you share the word list in utf8 ?
I tried adding "บาท" to 
https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt 
 
and it produces a working regexp.
Bayes stopwords languages must also be enabled using 
"bayes_stopword_languages" config keyword, by default only english is enabled.
   Giovanni

On 12/28/23 17:06, Jimmy wrote:
 > bayes_stopword_th https://pastebin.pl/view/0838138d 
 >
 > Sample mail https://pastebin.pl/view/e5a2c5b8  
>
 >
 > Jimmy
 >
 >
 > On Thu, Dec 28, 2023 at 10:59 PM mailto:giova...@paclan.it> 
>> wrote:
 >
 >     Could you share a config line and a sample you are using ?
 >        Giovanni
 >
 >     On 12/28/23 16:26, Jimmy wrote:
 >      > Yes, I have done that, and I am also editing Plugin/Bayes.pm to 
investigate why it is not being skipped. I suspect that if words are not separated by 
spaces, longer words may not match those patterns.
 >      >
 >      > Jimmy
 >      >
 >      > On Thu, Dec 28, 2023 at 10:13 PM mailto:giova...@paclan.it> 
>  
      >
 >      >     "spamassassin -D bayes" will tell you, you should see a line 
like:
 >      >     bayes: skipped token 'from' because it's in stopword list for 
language 'en'
 >      >
 >      >        Giovanni
 >      >
 >      >     On 12/28/23 15:45, Jimmy wrote:
 >      >      > The pattern has successfully passed the test script, but 
it needs to check whether Bayes learning will identify and possibly exclude the word 
from matching this pattern.
 >      >      >
 >      >      > Thank you.
 >      >      >
 >      >      >
 >      >      > On Thu, Dec 28, 2023 at 9:22 PM mailto:giova...@paclan.it> >  >> 
 >   wrote:

Re: Bayes Stopword

2023-12-29 Thread Jimmy
I use SpamAssassin 4.0.0 (2022-12-14)

$ spamassassin -D --lint 2>&1 | grep bayes:
Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=en
Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=th
Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=ru
Dec 29 15:17:56.919 [17420] dbg: bayes: stopword found lang=fr
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ja
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=zh
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=dk
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=nl
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=de
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=es
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fi
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=fr
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=it
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=no
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=ru
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=se
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=tr
Dec 29 15:17:56.920 [17420] dbg: bayes: stopword found lang=vi
Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=ko
Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=zh
Dec 29 15:17:56.921 [17420] dbg: bayes: stopword found lang=hi
Dec 29 15:17:58.019 [17420] dbg: bayes: stopwords for languages enabled: en
th ru fr ja zh dk nl de es fi fr it no ru se tr vi ko zh hi


$ spamassassin -D bayes,learn < test.msg 2>&1 | grep "skipped token"
Dec 29 15:16:57.585 [17347] dbg: bayes: skipped token 'Email' because it's
in stopword list for language 'en'

You can use "บาท" that was listed in regexp pattern but somehow I don't
know why it not show skipped token in bayes.

Jimmy


On Fri, Dec 29, 2023 at 2:59 PM  wrote:

> Config line produces a syntax error for me:
> config: failed to parse line in /etc/mail/spamassassin/local.cf (line 1):
> bayes_stopword_th
>
> Could you share the word list in utf8 ?
> I tried adding "บาท" to
> https://raw.githubusercontent.com/stopwords-iso/stopwords-th/master/stopwords-th.txt
> and it produces a working regexp.
> Bayes stopwords languages must also be enabled using
> "bayes_stopword_languages" config keyword, by default only english is
> enabled.
>   Giovanni
>
> On 12/28/23 17:06, Jimmy wrote:
> > bayes_stopword_th https://pastebin.pl/view/0838138d <
> https://pastebin.pl/view/0838138d>
> > Sample mail https://pastebin.pl/view/e5a2c5b8 <
> https://pastebin.pl/view/e5a2c5b8>
> >
> > Jimmy
> >
> >
> > On Thu, Dec 28, 2023 at 10:59 PM  giova...@paclan.it>> wrote:
> >
> > Could you share a config line and a sample you are using ?
> >Giovanni
> >
> > On 12/28/23 16:26, Jimmy wrote:
> >  > Yes, I have done that, and I am also editing Plugin/Bayes.pm to
> investigate why it is not being skipped. I suspect that if words are not
> separated by spaces, longer words may not match those patterns.
> >  >
> >  > Jimmy
> >  >
> >  > On Thu, Dec 28, 2023 at 10:13 PM  giova...@paclan.it> >>
> wrote:
> >  >
> >  > "spamassassin -D bayes" will tell you, you should see a line
> like:
> >  > bayes: skipped token 'from' because it's in stopword list for
> language 'en'
> >  >
> >  >Giovanni
> >  >
> >  > On 12/28/23 15:45, Jimmy wrote:
> >  >  > The pattern has successfully passed the test script, but
> it needs to check whether Bayes learning will identify and possibly exclude
> the word from matching this pattern.
> >  >  >
> >  >  > Thank you.
> >  >  >
> >  >  >
> >  >  > On Thu, Dec 28, 2023 at 9:22 PM   > 
>  >  >  >
> >  >  > On 12/28/23 12:59, Jimmy wrote:
> >  >  >  > Hi,
> >  >  >  >
> >  >  >  > I'm seeking assistance in incorporating a stopword
> for Asian languages in Unicode. Although I possess comprehensive word
> lists, my attempts to generate a regex pattern and test it have been
> unsuccessful; the pattern fails to match or skips tokens in the newly added
> stopword list.
> >  >  >  >
> >  >  >  > I created the regex pattern using the following
> code:
> >  >  >  >
> >  >  >  >
> Regexp::Assemble->new->add(@words)->reduce(0)->as_string
> >  >  >  >
> >  >  >  > Afterward, I converted it to UTF-8 hex.
> >  >  >  >
> >  >  >  > I'm wondering if there are any tools available to
> facilitate the creation of these regex patterns.
> >  >  >  >
> >  >  > I have used Regexp::Trie to create Bayes