https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7977
Bug ID: 7977 Summary: sa-learn --mbox broken in trunk Product: Spamassassin Version: SVN Trunk (Latest Devel Version) Hardware: PC OS: Linux Status: NEW Severity: normal Priority: P2 Component: Libraries Assignee: dev@spamassassin.apache.org Reporter: fr...@morgul.net Target Milestone: Undefined I have not yet tracked this to a specific commit, but one of the recent changes to ArchiveIterator.pm seems to have broken sa-learn --mbox. I have tested this under perl v5.34.0 and v5.32.1 (Debian unstable and stable, respectively). Example of a failure: noahm@74805e6e29ad:/tmp$ spamassassin --lint noahm@74805e6e29ad:/tmp$ spamassassin --version SpamAssassin version 4.0.0-r1899900 running on Perl version 5.34.0 noahm@74805e6e29ad:/tmp$ sa-learn -D --spam --mbox < spam-2022-04-20_1040 2> debug.log Learned tokens from 0 message(s) (0 message(s) examined) noahm@74805e6e29ad:/tmp$ echo $? 1 debug log contains warnings related to nonexistent tmpfiles: Apr 21 15:44:53.887 [1396] dbg: bayes: expiry completed Apr 21 15:44:53.888 [1396] dbg: util: secure_tmpfile created a temporary file /tmp/.spamassassin13964mXj0atmp Apr 21 15:44:53.888 [1396] dbg: util: current PATH is: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin Apr 21 15:44:53.888 [1396] dbg: util: executable for bzip2 was found at /bin/bzip2 Apr 21 15:44:53.888 [1396] dbg: util: executable for xz was found at /usr/bin/xz Apr 21 15:44:53.888 [1396] dbg: util: executable for lzip was found at /usr/bin/lzip Apr 21 15:44:53.888 [1396] dbg: util: executable for lzop was found at /usr/bin/lzop Apr 21 15:44:53.888 [1396] dbg: archive-iterator: _set_default_message_selection_opts After: Scanprob[1], want_date[0], cache[0], from_regex[(?^:^From \\S+ ?(\\S\\S\\S \\S\\S\\S .?\\d .?\\d:\\d\\d:\\d\\d \\d{4}|.?\\d-\\d\\d-\\d{4}_\\d\\d:\\d\\d:\\d\\d_))] Apr 21 15:44:53.889 [1396] dbg: archive-iterator: no access to /tmp/.spamassassin13964mXj0atmp.0: No such file or directory Apr 21 15:44:53.889 [1396] dbg: archive-iterator: no access to /tmp/.spamassassin13964mXj0atmp.4149: No such file or directory Apr 21 15:44:53.889 [1396] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x55dd494c44b8) implements 'learner_close', priority 0 If I revert the most recent changes to ArchiveIterator.pm (r1899848, r1899843, and r1899836) things work again: noahm@c30d8e8ec714:/src/spamassassin$ sa-learn -D --spam --mbox < spam-2022-04-20_1040 2> debug-revert.log Learned tokens from 2 message(s) (2 message(s) examined) noahm@c30d8e8ec714:/src/spamassassin$ echo $? 0 In this case, the debug output indicates that the mailbox content is being parsed as expected: Apr 21 15:55:49.067 [146] dbg: util: secure_tmpfile created a temporary file /tmp/.spamassassin146faHm2gtmp Apr 21 15:55:49.067 [146] dbg: archive-iterator: _set_default_message_selection_opts After: Scanprob[1], want_date[0], cache[0], from_regex[(?^:^From \\S+ ?(\\S\\S\\S \\S\\S\\S .?\\d .?\\d:\\d\\d:\\d\\d \\d{4}|.?\\d-\\d\\d-\\d{4}_\\d\\d:\\d\\d:\\d\\d_))] Apr 21 15:55:49.069 [146] dbg: archive-iterator: _run_mailbox /tmp/.spamassassin146faHm2gtmp, ofs 0, limit 512000 Apr 21 15:55:49.070 [146] dbg: config: time limit 300.0 s Apr 21 15:55:49.071 [146] dbg: message: _decode_header return-path: <bounces+202204zz-noahm=debian....@tracker.debian.org> -- You are receiving this mail because: You are the assignee for the bug.