https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7977

            Bug ID: 7977
           Summary: sa-learn --mbox broken in trunk
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Libraries
          Assignee: dev@spamassassin.apache.org
          Reporter: fr...@morgul.net
  Target Milestone: Undefined

I have not yet tracked this to a specific commit, but one of the recent changes
to ArchiveIterator.pm seems to have broken sa-learn --mbox.  I have tested this
under perl v5.34.0 and v5.32.1 (Debian unstable and stable, respectively).

Example of a failure:

noahm@74805e6e29ad:/tmp$ spamassassin --lint
noahm@74805e6e29ad:/tmp$ spamassassin --version
SpamAssassin version 4.0.0-r1899900
  running on Perl version 5.34.0
noahm@74805e6e29ad:/tmp$ sa-learn -D --spam --mbox < spam-2022-04-20_1040 2>
debug.log
Learned tokens from 0 message(s) (0 message(s) examined)
noahm@74805e6e29ad:/tmp$ echo $?
1

debug log contains warnings related to nonexistent tmpfiles:

Apr 21 15:44:53.887 [1396] dbg: bayes: expiry completed
Apr 21 15:44:53.888 [1396] dbg: util: secure_tmpfile created a temporary file
/tmp/.spamassassin13964mXj0atmp
Apr 21 15:44:53.888 [1396] dbg: util: current PATH is:
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Apr 21 15:44:53.888 [1396] dbg: util: executable for bzip2 was found at
/bin/bzip2
Apr 21 15:44:53.888 [1396] dbg: util: executable for xz was found at
/usr/bin/xz
Apr 21 15:44:53.888 [1396] dbg: util: executable for lzip was found at
/usr/bin/lzip
Apr 21 15:44:53.888 [1396] dbg: util: executable for lzop was found at
/usr/bin/lzop
Apr 21 15:44:53.888 [1396] dbg: archive-iterator:
_set_default_message_selection_opts After: Scanprob[1], want_date[0], cache[0],
from_regex[(?^:^From \\S+  ?(\\S\\S\\S \\S\\S\\S .?\\d .?\\d:\\d\\d:\\d\\d
\\d{4}|.?\\d-\\d\\d-\\d{4}_\\d\\d:\\d\\d:\\d\\d_))]
Apr 21 15:44:53.889 [1396] dbg: archive-iterator: no access to
/tmp/.spamassassin13964mXj0atmp.0: No such file or directory
Apr 21 15:44:53.889 [1396] dbg: archive-iterator: no access to
/tmp/.spamassassin13964mXj0atmp.4149: No such file or directory
Apr 21 15:44:53.889 [1396] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0x55dd494c44b8) implements
'learner_close', priority 0

If I revert the most recent changes to ArchiveIterator.pm (r1899848, r1899843,
and r1899836) things work again:

noahm@c30d8e8ec714:/src/spamassassin$ sa-learn -D --spam --mbox <
spam-2022-04-20_1040 2> debug-revert.log
Learned tokens from 2 message(s) (2 message(s) examined)
noahm@c30d8e8ec714:/src/spamassassin$ echo $?
0

In this case, the debug output indicates that the mailbox content is being
parsed as expected:

Apr 21 15:55:49.067 [146] dbg: util: secure_tmpfile created a temporary file
/tmp/.spamassassin146faHm2gtmp
Apr 21 15:55:49.067 [146] dbg: archive-iterator:
_set_default_message_selection_opts After: Scanprob[1], want_date[0], cache[0],
from_regex[(?^:^From \\S+  ?(\\S\\S\\S \\S\\S\\S .?\\d .?\\d:\\d\\d:\\d\\d
\\d{4}|.?\\d-\\d\\d-\\d{4}_\\d\\d:\\d\\d:\\d\\d_))]
Apr 21 15:55:49.069 [146] dbg: archive-iterator: _run_mailbox
/tmp/.spamassassin146faHm2gtmp, ofs 0, limit 512000
Apr 21 15:55:49.070 [146] dbg: config: time limit 300.0 s
Apr 21 15:55:49.071 [146] dbg: message: _decode_header return-path:
<bounces+202204zz-noahm=debian....@tracker.debian.org>

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to