https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8064

            Bug ID: 8064
           Summary: Sa-learn takes a very long time to learn each letter
           Product: Spamassassin
           Version: 3.4.6
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Learner
          Assignee: dev@spamassassin.apache.org
          Reporter: al...@data-netsoft.ru
  Target Milestone: Undefined

Created attachment 5845
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5845&action=edit
Log and settings

When I teach him the Bayesian classifier, a lot of time is spent on each
letter, more than 30 seconds! I can't understand why this is happening. Here is
a piece of the sa-learn log where you can see the delay:


---- begin -----
sa-learn -D --spam --no-sync --username=vmail /tmp/111.msg
...
Oct 14 16:18:14.126 [482455] dbg: uri: canonicalizing parsed uri:
mailto:al...@mydomain.com
Oct 14 16:18:14.126 [482455] dbg: uri: cleaned uri: mailto:al...@mydomain.com
Oct 14 16:18:14.126 [482455] dbg: uri: added host: mydomain.com domain:
mydomain.com
Oct 14 16:18:14.126 [482455] dbg: uri: canonicalizing domainkeys uri:
domainkeys:mydomain.com
Oct 14 16:18:14.126 [482455] dbg: uri: cleaned uri: domainkeys:mydomain.com
Oct 14 16:18:14.126 [482455] dbg: uri: added host: mydomain.com domain:
mydomain.com
Oct 14 16:18:14.358 [482455] dbg: bayes: tokenized body: 11 tokens
Oct 14 16:18:14.358 [482455] dbg: bayes: tokenized uri: 5 tokens
Oct 14 16:18:14.358 [482455] dbg: bayes: tokenized invisible: 0 tokens
Oct 14 16:18:14.360 [482455] dbg: bayes: tokenized header: 145 tokens
Oct 14 16:18:49.346 [482455] dbg: bayes: tokenized body: 11 tokens
Oct 14 16:18:49.346 [482455] dbg: bayes: tokenized uri: 5 tokens
Oct 14 16:18:49.346 [482455] dbg: bayes: tokenized invisible: 0 tokens
Oct 14 16:18:49.347 [482455] dbg: bayes: tokenized header: 145 tokens
Oct 14 16:19:25.725 [482455] dbg: bayes: seen
(92892bf23689ce621c550aee0ed36d2e8264a618@sa_generated) put
Oct 14 16:19:25.725 [482455] dbg: bayes: learned
'92892bf23689ce621c550aee0ed36d2e8264a618@sa_generated', atime: 1665752160
Oct 14 16:19:25.725 [482455] dbg: TxRep: learning a message
Oct 14 16:19:25.725 [482455] dbg: check: pms new, time limit in 228.393 s
Oct 14 16:19:25.725 [482455] dbg: message: using Return-Path header as
EnvelopeFrom: 'al...@mydomain.com'
Oct 14 16:19:25.725 [482455] dbg: check: tagrun - tag SENDERDOMAIN is now
ready, value: mydomain.com
Oct 14 16:19:25.725 [482455] dbg: check: tagrun - tag AUTHORDOMAIN is now
ready, value: mydomain.com
...

...
----- end ------

I thought at first that it might be Ackdns, I tried to comment out the plugin
in the v340.pre file, but it didn't help.

I can't understand why there is a delay in these places. I tried running
spamassassin without using Mysql - the delay in training is about the same.
I didn't include any exclusive parameters. Everything was set up with a clean
install.
I attach the full output of sa-lern logs, as well as all my configuration
files.

Otherwise, spamassassin works as it should in a bundle of
Postfix+Dovecot+Spamassassin+Roundcube (Ubuntu 20.04). I need to get rid of the
delay, because when a user clicks the "spam" button in Roundcube, it takes a
very long time until the email is examined. Users complain about such a long
delay.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to