Re: sa-learn won't read db created via MSTOR

2017-07-10 Thread RW
On Sat, 8 Jul 2017 21:55:36 +0100
RW wrote:

> On Sat, 8 Jul 2017 14:14:42 -0500
> Jerry Malcolm wrote:

> As a proof of concept try a small mbox file with
> 
> mbox_format_from_regex /^From\s/

and if it works try this instead:


/^From \S+  ?(\S\S\S \S\S\S .?\d .?\d:\d\d:\d\d \d{4})/



Re: sa-learn won't read db created via MSTOR

2017-07-08 Thread Antony Stone
On Saturday 08 July 2017 at 22:55:36, RW wrote:

> I had a spillage and most of the punctuation characters
> on my keyboard aren't working at the moment.

Oh dear, my sympathies - but what a splendid quote on a mailing list :)


Antony.

-- 
Salad is what food eats.

   Please reply to the list;
 please *don't* CC me.


Re: sa-learn won't read db created via MSTOR

2017-07-08 Thread RW
On Sat, 8 Jul 2017 14:14:42 -0500
Jerry Malcolm wrote:

> Thanks for the info.  Unfortunately, I don't have a clue how to 
> interpret a regex expression.  I couldn't find any reference to 
> mbox_format_from_regex in the 3.1.x Mail::SpamAssassin::Conf that
> came up when I googled it.

I hope you aren't actually running 3.1.x because that's ten years old.
 
> The separators in my mbox file are:
> 
>  From - Sat Jul 8 01:02:28 2017

That looks to be the problem.

As a proof of concept try a small mbox file with

mbox_format_from_regex /^From\s/

This is actually all you need if the mbox files are properly formatted
and lines that start "From " are escaped. I would give you a fuller
replacement but I had a spillage and most of the punctuation characters
on my keyboard aren't working at the moment.


Re: sa-learn won't read db created via MSTOR

2017-07-08 Thread Jerry Malcolm
Upon further investigation, I don't think sa-learn is even attempting to 
open the file.   I get the exact same message whether I give it a real 
file or just a string of characters for a file name:


[C:\Program Files\JAM Software\SpamAssassin in a Box]sa-learn.exe --spam 
--mbox c:\IMAPUtil\temp\uncaughtSpam.mstor\temp

Learned tokens from 0 message(s) (0 message(s) examined)

[C:\Program Files\JAM Software\SpamAssassin in a Box]sa-learn.exe --spam 
--mbox 

Learned tokens from 0 message(s) (0 message(s) examined)

This can't be right.  How can I tell if it's really reading the file?


On 7/8/2017 2:14 PM, Jerry Malcolm wrote:
Thanks for the info.  Unfortunately, I don't have a clue how to 
interpret a regex expression.  I couldn't find any reference to 
mbox_format_from_regex in the 3.1.x Mail::SpamAssassin::Conf that came 
up when I googled it.


The separators in my mbox file are:

From - Sat Jul 8 01:02:28 2017

Can someone who speaks regex tell me if this syntax is my problem, and 
if so, point me to where I can find the correct regex that matches 
this that I can copy/paste?


Thanks.

Jerry


On 7/8/2017 8:45 AM, RW wrote:

On Sat, 8 Jul 2017 01:57:47 -0500
Jerry Malcolm wrote:


Below is a complete log dump from the -D option on sa-learn.

...


_set_default_message_selection_opts After: Scanprob[1], want_date[0],
cache[0], from_regex[^From \S+ ?(\S\S\S \S\S\S .\d .\d:\d\d:\d\d
\d{4}|.\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)]

Check that this default regex matches your mbox separator, you may need
to set mbox_format_from_regex. See the Mail::SpamAssassin::Conf
documentation



---
This email has been checked for viruses by AVG.
http://www.avg.com





Re: sa-learn won't read db created via MSTOR

2017-07-08 Thread Jerry Malcolm
Thanks for the info.  Unfortunately, I don't have a clue how to 
interpret a regex expression.  I couldn't find any reference to 
mbox_format_from_regex in the 3.1.x Mail::SpamAssassin::Conf that came 
up when I googled it.


The separators in my mbox file are:

From - Sat Jul 8 01:02:28 2017

Can someone who speaks regex tell me if this syntax is my problem, and 
if so, point me to where I can find the correct regex that matches this 
that I can copy/paste?


Thanks.

Jerry


On 7/8/2017 8:45 AM, RW wrote:

On Sat, 8 Jul 2017 01:57:47 -0500
Jerry Malcolm wrote:


Below is a complete log dump from the -D option on sa-learn.

...


_set_default_message_selection_opts After: Scanprob[1], want_date[0],
cache[0], from_regex[^From \S+ ?(\S\S\S \S\S\S .\d .\d:\d\d:\d\d
\d{4}|.\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)]

Check that this default regex matches your mbox separator, you may need
to set mbox_format_from_regex. See the Mail::SpamAssassin::Conf
documentation




Re: sa-learn won't read db created via MSTOR

2017-07-08 Thread RW
On Sat, 8 Jul 2017 01:57:47 -0500
Jerry Malcolm wrote:

> Below is a complete log dump from the -D option on sa-learn. 
...

> _set_default_message_selection_opts After: Scanprob[1], want_date[0], 
> cache[0], from_regex[^From \S+ ?(\S\S\S \S\S\S .\d .\d:\d\d:\d\d 
> \d{4}|.\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)]

Check that this default regex matches your mbox separator, you may need
to set mbox_format_from_regex. See the Mail::SpamAssassin::Conf
documentation


Re: sa-learn won't read db created via MSTOR

2017-07-08 Thread Jerry Malcolm
Below is a complete log dump from the -D option on sa-learn.  I am 
really curious that the file name I passed in is never even mentioned in 
the log. Is that expected? Do I have some sort of syntax error passing 
the mbox filename in?  Here's the command:


 [C:\Program Files\JAM Software\SpamAssassin in a Box]sa-learn -D 
--spam --showdots --mbox c:\imaputil\temp\uncaughtspam.mstor\temp


Thx,

Jerry

Jul  8 01:47:42.704 [12972] dbg: logger: adding facilities: all
Jul  8 01:47:42.704 [12972] dbg: logger: logging level is DBG
Jul  8 01:47:42.704 [12972] dbg: generic: SpamAssassin version 3.4.1
Jul  8 01:47:42.704 [12972] dbg: generic: Perl 5.022001, 
PREFIX=C:\Program Files\JAM Software\SpamAssassin in a Box\runtime, 
DEF_RULES_DIR=C:\ProgramData\JAM Software\spamdService\sa-rules, 
LOCAL_RULES_DIR=C:\ProgramData\JAM Software\spamdService\sa-config, 
LOCAL_STATE_DIR=..\share

Jul  8 01:47:42.705 [12972] dbg: config: timing enabled
Jul  8 01:47:42.706 [12972] dbg: config: score set 0 chosen.
Jul  8 01:47:42.712 [12972] dbg: util: running in taint mode? no
Jul  8 01:47:42.712 [12972] dbg: util: defining getpwuid() wrapper using 
'unknown' as username
Jul  8 01:47:42.715 [12972] dbg: config: using "C:\ProgramData\JAM 
Software\spamdService\sa-config" for site rules pre files
Jul  8 01:47:42.715 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/init.pre
Jul  8 01:47:42.716 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/v310.pre
Jul  8 01:47:42.716 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/v312.pre
Jul  8 01:47:42.716 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/v320.pre
Jul  8 01:47:42.716 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/v330.pre
Jul  8 01:47:42.716 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/v340.pre
Jul  8 01:47:42.716 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/v341.pre
Jul  8 01:47:42.717 [12972] dbg: config: using "C:\ProgramData\JAM 
Software\spamdService\sa-rules" for sys rules pre files
Jul  8 01:47:42.717 [12972] dbg: config: using "C:\ProgramData\JAM 
Software\spamdService\sa-rules" for default rules dir
Jul  8 01:47:42.717 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-rules/sa_zmi_at.cf
Jul  8 01:47:42.718 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-rules/sought_rules_yerp_org.cf
Jul  8 01:47:42.718 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-rules/spamassassin_heinlein-support_de.cf
Jul  8 01:47:42.718 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-rules/updates_spamassassin_org.cf
Jul  8 01:47:42.718 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-rules/xsaupdate_jam-software_com.cf
Jul  8 01:47:42.718 [12972] dbg: config: using "C:\ProgramData\JAM 
Software\spamdService\sa-config" for site rules dir
Jul  8 01:47:42.719 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/20_khop_bl.cf
Jul  8 01:47:42.719 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/contact.cf
Jul  8 01:47:42.720 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/jam.cf
Jul  8 01:47:42.720 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/jam_DNSBL.cf
Jul  8 01:47:42.720 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/jam_example_rules.cf
Jul  8 01:47:42.720 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/jam_virus_bounce_rules.cf
Jul  8 01:47:42.720 [12972] dbg: config: read file C:\ProgramData\JAM 
Software\spamdService\sa-config/local.cf
Jul  8 01:47:42.721 [12972] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::URIDNSBL from @INC
Jul  8 01:47:42.727 [12972] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::Hashcash from @INC
Jul  8 01:47:42.733 [12972] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::SPF from @INC
Jul  8 01:47:42.738 [12972] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::Pyzor from @INC

Jul  8 01:47:42.740 [12972] dbg: pyzor: network tests on, attempting Pyzor
Jul  8 01:47:42.740 [12972] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::Razor2 from @INC

Jul  8 01:47:42.806 [12972] dbg: razor2: razor2 is available, version 2.84
Jul  8 01:47:42.806 [12972] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::SpamCop from @INC
Jul  8 01:47:45.307 [12972] dbg: reporter: network tests on, attempting 
SpamCop
Jul  8 01:47:45.307 [12972] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::AutoLearnThreshold from @INC
Jul  8 01:47:45.309 [12972] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::TextCat from @INC
Jul  8 01:47:45.313 [12972] dbg: textcat: loading languages file 

sa-learn won't read db created via MSTOR

2017-07-07 Thread Jerry Malcolm
My client mail repository is in a sql db and is not an option for 
sa-learn to read directly.  That's fine.  I wrote a utility that reads 
all the mail out of the uncaught-spam folder from my db and creates an 
mbox folder using the mstor java package.  The mbox file gets created 
with no problem.  When I run sa-learn, it says 0 messages were 
examined.  The mbox folder has about 2500 spam messages in it.  I've 
seen lots of discussion on the forums about whether or not sa-learn will 
'process' a message based on whether it's processed it before, etc.  I 
understand that.   But this is the very first time I've ever tried to 
run sa-learn.   And this error implies that it is not even finding any 
messages to process.


Here's the command and response: (Win server 2008)

[C:\Program Files\JAM Software\SpamAssassin in a Box]sa-learn 
--spam --mbox --showdots c:\imaputil\temp\uncaughtspam.mstor\temp


Learned tokens from 0 message(s) (0 message(s) examined)

I've used the mstor package before and have had zero problems with it.  
So I have no reason to assume it's creating a corrupted mbox folder 
file.  The mbox folder is present and is being found (I tried renaming 
it and got a 'not found' error from sa-learn). I've opened it in an 
editor, and to the extent I can tell, it looks like an mbox file.  There 
is about a 10-15 sec time lapse while sa-learn is 'running' before it 
displays the message.  So it appears that it's reading the mbox file.  
But for some reason it thinks there are no messages inside it.


I'm at a loss right now.  Is there anyway to get additional information 
on why it thinks there are no messages in the mbox file?  I can post the 
mbox file if necessary.  If there are any debug flags that will help me 
figure out what is wrong, I can do debug as well.


Thanks.

Jerry