Re: Bayes DB does not grow anymore

2005-03-14 Thread Kai Schaetzl
GRP Productions wrote on Mon, 14 Mar 2005 00:32:42 +0200:

> You are right, I am using MailWatch. I just posted this output to be easy 
> for one to see the actual dates without having to convert.

That's okay, the problem just is one cannot be sure how accurate it is. Knowing 
that you use MS would have been useful, anyway :-)
(BTW: my version of Mailwatch can't show this, do you use a CVS version?)

 Here is the 
> actual output: 
>  
> # /usr/bin/sa-learn -p /opt/MailScanner/etc/spam.assassin.prefs.conf --dump 
> magic 
> 0.000  0  3  0  non-token data: bayes db version 
> 0.000  0  49740  0  non-token data: nspam 
> 0.000  0  47167  0  non-token data: nham 
> 0.000  0 123325  0  non-token data: ntokens

I didn't look at this closely before, but I think this ratio indicates a 
problem, f.i. this is from our own mail server (just getting our own mail, not 
our clients'):

0.000  0  30089  0  non-token data: nspam
0.000  0  12515  0  non-token data: nham
0.000  01001630  0  non-token data: ntokens

See the number of tokens, we have ten times yours with less learned mail. That 
means that our db has much more tokens to qualify an email as ham or spam. Also 
your "hold time" is quite low, it's about a month. I think we haven tokens from 
even a year ago. That's maybe a bit too much, but I strongly suggest upping 
your bayes_expiry_max_db_size to something like 500.000 or so. Since you have a 
much higher flux of messages than we have on that machine you are literally 
"burning" your db to uselessness.

> No it isn't. This is exactly the point I mentioned.

But you didn't prove it ;-)

 But as I said earlier, 
> sa-learn claims it has learned, even from the web interface: 
> >SA Learn: Learned from 1 message(s) (1 message(s) examined). 

And you learned by specifying the config file? I suspect that you are at least 
occasionally using two SA configurations, the one coming with MS and the one 
coming with SA.

> This is getting more suspicious: there is no bayes_journal file! 

Oh. Still possible, though. You don't need to have one, but on high volume 
systems it's highly recommended. Check your SA config (whereever it is :-) for 
bayes_learn_to_journal 1. I don't know if it is 1 by default, though. What do 
you have starting with bayes in your config file?

> -rw-rw-rw-  1 root nobody 1236 Mar 14 00:22 bayes.mutex 
> -rw-rw-rw-  1 root nobody 10452992 Mar 14 00:22 bayes_seen 
> -rw-rw-rw-  1 root nobody  5509120 Mar 14 00:02 bayes_toks 

bayes_seen is quite high. I haven't ever seen that it is higher than bayes_toks 
on our systems. But maybe that's normal for high volume systems, I don't know. 
On the Mailscanner list many people complain about very big bayes_seen files. 
Someone else on this list should comment on the size.

> I can assure you noone has touched anything inside this directory. If this 
> is the reason for the problems I've been facing, is there a way to recreate 
> the file without having to lose my current data? (perhaps by copying the 
> above files somewhere, execute sa-learn --clear and some time later restore 
> the above files?)

Don't know if this would be of any help. As I said, I suspect you are using at 
least two different bayes dbs. At least when you do it from the command line. 
Run an "updatedb" and then "locate bayes" (this may not locate all files, f.i. 
not in /var !).
MS, of course, can only use one and doesn't have a chance of confusing that, so 
when it uses SA that learns and checks the same db. And so far that part seems 
to be okay (except for the bigger size of bayes_seen, but as I said, this may 
be normal for your setup, I really don't know). But you burn your tokens too 
fast. At least that's what I think.


Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org





save debug output and errors

2005-03-14 Thread Ben Wylie
Hello All,

This is the first time I have used this list so if should have done it a
different way, or have said something wrong, I beg your forgiveness.

I have been running spamassassin for over a year now on my Windows 2003
system using perl 5.6.1. My SA version is 3.0.2.
I run Spamassassin from a batch file:

cd F:\perl\bin
set tmpfile=%1.tmp
f:\perl\bin\perl.exe spamassassin < %1 > %tmpfile%
move /Y %tmpfile% %1

Where %1 is the path to the email file.

I have been having problems with SA for a while, where it is running, but
clearly not functioning correctly, as the BAYES tests are not completing and
various other tests are not either.

Today I decided to try testing it by manually testing some files with the
debug option enabled.

My first question is about saving this debug info. Running this from the
command prompt means that the debug info goes past rapidly, and I lose most
of it as it scrolls off the top of the display. Is there an option to save
debug info to a file which I can look through in my own time?

>From the parts of the debug that I have been able to look at, there are
various errors which I have been unable to work out a solution to, mainly
because I don't understand what they mean.

1) Argument "BODY" isn't numeric in addition (+) at
F:\Perl\site\lib/Mail/SpamAssassin/Conf.pm line 244.

2) debug: URIDNSBL: queries completed: 2 started: 0
debug: URIDNSBL: queries active:  at Sun Mar 13 14:19:17 2005
debug: done waiting for URIDNSBL lookups to complete
debug: running meta tests; score so far=14.882
Scalar found where operator expected at (eval 57) line 724, near "} $self"
(Missing operator before  $self?)
Failed to run meta SpamAssassin tests, skipping some: syntax error at (eval
57)
line 724, near "} $self"

This second error stops SA from doing any more tests and it just scores the
email on the tests it has already done.

What is also confusing, is that different tests get done when I run the test
by hand, and by my mail server, using exactly the same batch file. For
example, the BAYES test is no longer done when run by my mailserver, but
when done by hand it is.

Eg.
By Mailserver:
X-Spam-Report: 
*  0.1 FORGED_RCVD_HELO Received: contains a forged HELO
*  0.5 INFO_TLD URI: Contains an URL in the INFO top-level domain
*  1.6 NUMERIC_HTTP_ADDR URI: Uses a numeric IP address in URL
*  1.3 HTML_IMAGE_ONLY_16 BODY: HTML: images with 1200-1600 bytes of
words
*  0.0 HTML_60_70 BODY: Message is 60% to 70% HTML
*  2.7 URIBL_SBL Contains an URL listed in the SBL blocklist
*  [URIs: dhashsad.info]

By Hand, but with the same batch file:
X-Spam-Report: 
*  2.1 NUMERIC_HTTP_ADDR URI: Uses a numeric IP address in URL
*  5.0 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
*  [score: 0.9976]
*  1.0 HTML_IMAGE_ONLY_16 BODY: HTML: images with 1200-1600 bytes of
words
*  2.7 URIBL_SBL Contains an URL listed in the SBL blocklist
*  [URIs: dhashsad.info]
*  4.0 URIBL_OB_SURBL Contains an URL listed in the OB SURBL
blocklist
*  [URIs: dhashsad.info]

Help in any or all of these matters, very much appreciated. If you think
there is a document which I have not read well enough which might contain
useful info, please do give me a link so I can look it up.

Thanks,
Ben





Re: Bayes DB does not grow anymore

2005-03-14 Thread GRP Productions
That's okay, the problem just is one cannot be sure how accurate it is. 
Knowing
that you use MS would have been useful, anyway :-)
(BTW: my version of Mailwatch can't show this, do you use a CVS version?)
Indeed, this is the CVS version :-)
See the number of tokens, we have ten times yours with less learned mail. 
That
means that our db has much more tokens to qualify an email as ham or spam. 
Also
This is perhaps because I have been using only 'mistake-based' training (ie 
training only when false classificaiton happens). However this used to work 
fine.

your "hold time" is quite low, it's about a month. I think we haven tokens 
from
even a year ago. That's maybe a bit too much, but I strongly suggest upping
your bayes_expiry_max_db_size to something like 500.000 or so. Since you 
have a
much higher flux of messages than we have on that machine you are literally
"burning" your db to uselessness.
So what would you suggest? I certainly dont want to lose everything that has 
been learned till now.

And you learned by specifying the config file? I suspect that you are at 
least
occasionally using two SA configurations, the one coming with MS and the 
one
coming with SA.
Nope, there is definitely only the one comng with MS. I never use SA from 
the command line anyway.

Oh. Still possible, though. You don't need to have one, but on high volume
systems it's highly recommended. Check your SA config (whereever it is :-) 
for
bayes_learn_to_journal 1. I don't know if it is 1 by default, though. What 
do
you have starting with bayes in your config file?
# grep bayes /opt/MailScanner/etc/spam.assassin.prefs.conf
# be created as /var/spool/spamassassin/bayes_msgcount, etc.
#bayes_path /var/spool/spamassassin/bayes
#bayes_file_mode0600
bayes_path  /var/spool/MailScanner/bayes/bayes
bayes_file_mode 0666
# MailScanner: big bayes_toks.new files wasting space.
bayes_auto_expire 0
bayes_expiry_max_db_size 50
bayes_ignore_header X-MailScanner
bayes_ignore_header X-MailScanner-SpamCheck
bayes_ignore_header X-MailScanner-SpamScore
bayes_ignore_header X-MailScanner-Information
# use_bayes 0
Don't know if this would be of any help. As I said, I suspect you are using 
at
least two different bayes dbs. At least when you do it from the command 
line.
Run an "updatedb" and then "locate bayes" (this may not locate all files, 
f.i.
not in /var !).
I think there is only one.
MS, of course, can only use one and doesn't have a chance of confusing 
that, so
when it uses SA that learns and checks the same db. And so far that part 
seems
to be okay (except for the bigger size of bayes_seen, but as I said, this 
may
be normal for your setup, I really don't know). But you burn your tokens 
too
fast. At least that's what I think.
If I get it you mean that the tokens are lost very quickly? I think am 
confused , if bayes works with tokens, why does it need nspam and nham? Or 
are they just counters?

In general, do you think that setting bayes_expiry_max_db_size would be 
enough?
One final thing: Why even if i manually expire, the date of last expiration 
remains old?

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



Re: [Slight OT] Problems with perl modules req for rpmbuild -tb Mail-SpamAssassin-3.0.2.tar.gz

2005-03-14 Thread Loren Wilton
> >   ...The person with two clocks is never really sure of
> > the current time.
>
> OT, but... above - *not* a good quote, but it sounds nice)
> To be `sure' of the time, you need at least three clocks (look at the
> documentation for ntp/ntpd).

Its part of a larger quote, to the effect that someone with one clock is
sure of the time, someone withe two clocks isn't and I forget what is
supposed to happen as you get more clocks.  Maybe you get back closer to the
assurance you had with a single cheap windup clock.  I originally came
across the quote on the site of someone that collected atomic clocks.

Loren



Re: save debug output and errors

2005-03-14 Thread Loren Wilton
You probably have some broken rules or other lines in your local rules file,
or possibly elsewhere in the rules files.  To catch thos you want to run
"spamassassin --lint" without the debug stuff.  This will give you a
managable output.

Assuming you are on NT of some flavor, "spamassassin  2>&1
>outfile.txt" should catch the stderror output as well as the stdout output.
"" is obviously whatever parameters you want to give, like "-D


Re: DCC in Spamassassin

2005-03-14 Thread Norman Zhang
Bill Randle wrote:
I don't have rcddc either. With SpamAssassin, use dccifd as previously
mentioned. Once you edited the dcc_conf file to enable DCCIFD, start it
using the init program:
# /etc/init.d/dccd start
Thanks. dccifd did start. However, I'm seeing the following error message,
dccifd[8337]: clock changed an impossible -1.080998 seconds
May I ask if this is critical?
Regards,
Norman Zhang


Razor Files Missing

2005-03-14 Thread Norman Zhang
Hi,
When I run
# amavisd debug-sa
I see the following errors. Do I need to crate the missing files
manually? May I ask for a few pointers?
Regards,
Norman Zhang
 Razor-Log: No /var/lib/amavis/var/.razor/razor-agent.conf found, skipping.
 Razor-Log: No razor-agent.conf found, using defaults.
check[9585]: [ 2] [bootup] Logging initiated LogDebugLevel=9 to stdout
check[9585]: [ 5] computed razorhome=/var/lib/amavis/var/.razor, conf=,
ident=/var/lib/amavis/var/.razor/identity
check[9585]: [ 8] Client supported_engines: 4 8
check[9585]: [ 8]  prep_mail done: mail 1 headers=102, mime0=1376
check[9585]: [ 5] Can't read file
/var/lib/amavis/var/.razor/servers.discovery.lst: No such file or directory
check[9585]: [ 5] Can't read file
/var/lib/amavis/var/.razor/servers.nomination.lst: No such file or directory
check[9585]: [ 5] Can't read file
/var/lib/amavis/var/.razor/servers.catalogue.lst: No such file or directory



Re: Razor Files Missing

2005-03-14 Thread Bill Randle
On Sun, 2005-03-13 at 19:12 -0800, Norman Zhang wrote:
> Hi,
> 
> When I run
> 
> # amavisd debug-sa
> 
> I see the following errors. Do I need to crate the missing files
> manually? May I ask for a few pointers?
> 
> Regards,
> Norman Zhang
> 
>   Razor-Log: No /var/lib/amavis/var/.razor/razor-agent.conf found, skipping.
>   Razor-Log: No razor-agent.conf found, using defaults.
> 
> check[9585]: [ 2] [bootup] Logging initiated LogDebugLevel=9 to stdout
> check[9585]: [ 5] computed razorhome=/var/lib/amavis/var/.razor, conf=,
> ident=/var/lib/amavis/var/.razor/identity
> check[9585]: [ 8] Client supported_engines: 4 8
> check[9585]: [ 8]  prep_mail done: mail 1 headers=102, mime0=1376
> check[9585]: [ 5] Can't read file
> /var/lib/amavis/var/.razor/servers.discovery.lst: No such file or directory
> check[9585]: [ 5] Can't read file
> /var/lib/amavis/var/.razor/servers.nomination.lst: No such file or directory
> check[9585]: [ 5] Can't read file
> /var/lib/amavis/var/.razor/servers.catalogue.lst: No such file or directory
> 

Did you run "razor-admin -create" after installing razor? It will create
the razor-conf and *.lst files. You will want to do this as the user
that runs amavisd (typically, amavis or vscan). Given where amavisd is
looking for the razor files, I would guess you may also need to use the
"-home" option. E.g., as root:
# su amavis razor-admin -d -create -home /var/lib/amavis/var/.razor


-Bill



Re: Razor Files Missing

2005-03-14 Thread Norman Zhang
Did you run "razor-admin -create" after installing razor? It will create
the razor-conf and *.lst files. You will want to do this as the user
that runs amavisd (typically, amavis or vscan). Given where amavisd is
looking for the razor files, I would guess you may also need to use the
"-home" option. E.g., as root:
# su amavis razor-admin -d -create -home /var/lib/amavis/var/.razor
Thanks Bill for your help. amavis is a system account, so I can't do su 
amavis. I did

# razor-admin -d -create -home /var/lib/amavis/var/.razor
I see the razor keeps on trying to contact 66.151.150.12. I think I need 
to allow outgoing UDP/2703 connection on my firewall?

Regards,
Norman Zhang
admin[9705]: [ 5] Razor Discovery Server 66.151.150.12 is unreachable
admin[9705]: [ 5] Couldn't talk to discovery servers.  Will force a 
bootstrap...
admin[9705]: [ 6] no discovery listfile: 
/var/lib/amavis/var/.razor//servers.discovery.lst
admin[9705]: [ 5] Finding Discovery Servers via DNS in the 
razor2.cloudmark.com zone
admin[9705]: [ 6] Found 1 Discovery Servers via DNS in the 
razor2.cloudmark.com zone
admin[9705]: [ 5] no listfile: 
/var/lib/amavis/var/.razor//servers.nomination.lst
admin[9705]: [ 6] no discovery listfile: 
/var/lib/amavis/var/.razor//servers.discovery.lst
admin[9705]: [ 5] Finding Discovery Servers via DNS in the 
razor2.cloudmark.com zone
admin[9705]: [ 6] Found 1 Discovery Servers via DNS in the 
razor2.cloudmark.com zone
admin[9705]: [ 8] Checking with Razor Discovery Server 66.151.150.12
admin[9705]: [ 6] No port specified, using 2703
admin[9705]: [ 5] Connecting to 66.151.150.12 ...


Upgrade...

2005-03-14 Thread Doug Wolfgram
I upgraded to 3.02 and everything is working fine except for one thing... I 
always put spam from all users into the /var/mail directory with the 
following lines in procmailrc

:0:
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*
/var/mail/spam
After reading the new doc, I also added this line to the top of procmailrc...
DROPPRIVS=yes
while spamassassin is working, all messages are being sent to the original 
recipient instead of the spam box I set up. I've checked privs and all on 
the file -- even set them to 777 for testing.

Anything else I can check to see why I can't re-direct spam to the spam 
mailbox?

D

_
"If you're not in e-business ... you're not in business.."
_
Doug Wolfgram
GRAFX Group, Inc.
Cell: 949.433.3641
http://www.gfx.com




RE: save debug output and errors

2005-03-14 Thread Ben Wylie
-Original Message-
> From: Loren Wilton [mailto:[EMAIL PROTECTED] 
> Sent: 14 March 2005 02:36
> To: users@spamassassin.apache.org
> Subject: Re: save debug output and errors
>
> You probably have some broken rules or other lines in your local rules 
> file, or possibly elsewhere in the rules files.  To catch thos you want to
> run "spamassassin --lint" without the debug stuff.  This will give you a
> managable output.
Thanks for the help.
It pointed out some bad rules and other odd stuff, but there is one I don't
understand:

F:\Perl\bin>spamassassin log6.txt
Scalar found where operator expected at (eval 57) line 724, near "} $self"
(Missing operator before  $self?)
Failed to run meta SpamAssassin tests, skipping some: syntax error at (eval
57)
line 724, near "} $self"

where should I look for this to fix?

Thanks,
Ben





Re: Razor Files Missing

2005-03-14 Thread Bill Randle
On Sun, 2005-03-13 at 20:03 -0800, Norman Zhang wrote:
> > Did you run "razor-admin -create" after installing razor? It will create
> > the razor-conf and *.lst files. You will want to do this as the user
> > that runs amavisd (typically, amavis or vscan). Given where amavisd is
> > looking for the razor files, I would guess you may also need to use the
> > "-home" option. E.g., as root:
> > # su amavis razor-admin -d -create -home /var/lib/amavis/var/.razor
> 
> Thanks Bill for your help. amavis is a system account, so I can't do su 
> amavis. I did
> 
> # razor-admin -d -create -home /var/lib/amavis/var/.razor
> 
> I see the razor keeps on trying to contact 66.151.150.12. I think I need 
> to allow outgoing UDP/2703 connection on my firewall?

I'd say that's a good guess that you need to open that up if it's
blocked. When you run razor-admin, the files were created with
ownership by root. You will want to do a chown on them to the amavis
account once you have everything created:
# chown -R amavis.amavis /var/lib/amavis/var/.razor

-Bill

> Regards,
> Norman Zhang
> 
> admin[9705]: [ 5] Razor Discovery Server 66.151.150.12 is unreachable
> admin[9705]: [ 5] Couldn't talk to discovery servers.  Will force a 
> bootstrap...
> admin[9705]: [ 6] no discovery listfile: 
> /var/lib/amavis/var/.razor//servers.discovery.lst
> admin[9705]: [ 5] Finding Discovery Servers via DNS in the 
> razor2.cloudmark.com zone
> admin[9705]: [ 6] Found 1 Discovery Servers via DNS in the 
> razor2.cloudmark.com zone
> admin[9705]: [ 5] no listfile: 
> /var/lib/amavis/var/.razor//servers.nomination.lst
> admin[9705]: [ 6] no discovery listfile: 
> /var/lib/amavis/var/.razor//servers.discovery.lst
> admin[9705]: [ 5] Finding Discovery Servers via DNS in the 
> razor2.cloudmark.com zone
> admin[9705]: [ 6] Found 1 Discovery Servers via DNS in the 
> razor2.cloudmark.com zone
> admin[9705]: [ 8] Checking with Razor Discovery Server 66.151.150.12
> admin[9705]: [ 6] No port specified, using 2703
> admin[9705]: [ 5] Connecting to 66.151.150.12 ...



Re: save debug output and errors

2005-03-14 Thread Theo Van Dinter
On Mon, Mar 14, 2005 at 05:03:05AM -, Ben Wylie wrote:
> Scalar found where operator expected at (eval 57) line 724, near "} $self"
> (Missing operator before  $self?)
> Failed to run meta SpamAssassin tests, skipping some: syntax error at (eval
> 57)
> line 724, near "} $self"
> 
> where should I look for this to fix?

Looks like you still have a bad meta rule somewhere...

-- 
Randomly Generated Tagline:
 Bender: This is the Brooklyn-bound B train making local stops at wherever 
  the hell I feel like, watch for the closing doors.


pgpBVW9tH8676.pgp
Description: PGP signature


MRTG SPAM SYSLOG ?

2005-03-14 Thread ip.guy
hi all
is anyone using a tool that can parse "/var/log/messages" to find 
identified SPAM and is able to then build MTRG graphs ?

i was using a tool that could do this a while ago but have totally 
forgotten the name of the project

any help appreciated


Re: [Slight OT] Problems with perl modules req for rpmbuild -tb Mail-SpamAssassin-3.0.2.tar.gz

2005-03-14 Thread List Mail User
>...
>From: "List Mail User" <[EMAIL PROTECTED]>
>
>> >   ...The person with two clocks is never really sure of
>> > the current time.
>> 
>> OT, but... above - *not* a good quote, but it sounds nice)
>> To be `sure' of the time, you need at least three clocks (look at the
>> documentation for ntp/ntpd).
>

jdow wrote (quite correctly) at about 13:15:23 on Sun, 13 Mar 2005
>And even that is a gross oversimplification. (And I COULD setup my
>system at one time, at least, to be approximately 1 second off by
>picking the wrong ntp servers. Seems GTEI.NET's time server used for
>their DNS machines and first hop routers was off, considerably.)
>
>Reading the ntp/ntpd documentation is desirable in any case if one is
>interested in precision time keeping.
>
>{^_-}
>
Of course, you are correct; I was just "making fun" of the original
quote;  I actually use 3 GPS receivers on three different machines and
between the servers, in total use a overlapping set of 12 public peers/servers
- I still have to "fudge" (for the non-ntpd literate RTFM) due to bad skew in
my cheap receivers and actually find the best time source for many years has
been clepsydra.dec.com (long ago and far away, it was timekeeper,dec,com).  I
also set my own GPSs to stratum 1, because the skew is so bad.  Unfortunately,
it has been nearly ten years since all the U. Del. servers have been badly
overloaded and True-Time no longer allows public free access.

So, yes, I do take "time" seriously; But you are more than just
simply correct when you point out that my misstatement was almost as
incorrect as the quote I was poking fun at. (I've seen the lousy GTE servers;
until recently SBC ran some really good, but unpublished servers - but they
seem to have been taken down, or at least don't answer public queries as of
last month - they were the "old" pacbell servers - and the SBC published
servers "false tick" about once a hour).

Basically, I'm happy with keeping my net (under a hundred machines),
synced within 5-8 msec. and a estimated error on all servers (not just time
servers) under 2 msecs.  Better than that would mean buying "real" equipment.

At this moment, on my primary time server (i.e. "prefer"'d by most
of my machines), I'm within 12 msec for all the public peers/servers I use
except for otc2.psu.edu (who I should drop), louie.udel.edu, and (one of the
rotating x.pool.ntp.org servers) splinter.bowdoin.edu all of whom have "false
tick"'d in the past 25 minutes - for most of the rest I'm within 1.5 msec and
my own three time servers are all within 600 usec of each other (two on the
same subnet, one two subnets away).

Yes, what ntp does used to seem like magic, now I think the best
improvement will come with a better filtering function for "smoothing"
(e.g. a Widrow style adaptive filter instead of the simple FIR used).
Several people have shown that just adding taps doesn't provide any
improvement (I seem to remember an old paper showing the six or seven
was as good as or better than the eight tap filter actually used - but I
haven't done any `real' DSP work in a nearly decade).  Also IIRs look better
at times, but the `bad' cases entirely rule them out (now if you used a
Widrow filter to generate the IIR coefficients and did a fall back to FIR
whenever the output approached instability,...  I think I'll leave these
things to people like Eric Fair, etc. who have more time to spend thinking
about it than I do).

Anyway, I'd love to discuss this off list, but this is way OT for
the SA group.

Bye,

Paul Shupak
[EMAIL PROTECTED]


Re: Upgrade...

2005-03-14 Thread Loren Wilton
You've checked the obvious?  There is an X-Spam-Level header, and it does
have a bunch of asterisks?

Loren



Re: save debug output and errors

2005-03-14 Thread Loren Wilton
> where should I look for this to fix?

Theo has the answer for that one - a bad meta rule somewhere.

Given that --lint seemingly didn't catch it, and this pure perl error is
less than descriptive of the problem, once you find the rule in question, I
think it deserves a Bugzilla entry for failure to check sufficiently for bad
meta rules in lint.  You will need to supply the bad meta rule in question,
and possibly the rules it depends on.

If you don't have a Bugzilla account I'll enter it for you.

Loren



Re: MRTG SPAM SYSLOG ?

2005-03-14 Thread Alan Premselaar
ip.guy wrote:
hi all
is anyone using a tool that can parse "/var/log/messages" to find 
identified SPAM and is able to then build MTRG graphs ?

i was using a tool that could do this a while ago but have totally 
forgotten the name of the project

any help appreciated
I've used graphdefang in conjunction with MIMEDefang... although I think 
 you can run it independantly of MIMEdefang.

alan


Re: [Slight OT] Problems with perl modules req for rpmbuild -tb Mail-SpamAssassin-3.0.2.tar.gz

2005-03-14 Thread List Mail User
...
>Its part of a larger quote, to the effect that someone with one clock is
>sure of the time, someone withe two clocks isn't and I forget what is
>supposed to happen as you get more clocks.  Maybe you get back closer to the
>assurance you had with a single cheap windup clock.  I originally came
>across the quote on the site of someone that collected atomic clocks.
>
>Loren


Sorry, I was just poking fun at it (and by reference you).  I
have many quotes I like along the lines or "anybody sure of anything
is likely wrong" - It all lies in the degree of certainty of a belief;
Absolute certainty is for fanatics and fools (I just try for "correct"
more often than not).

Now if Loren were actually short for Lorentz...

Paul Shupak
[EMAIL PROTECTED]

P.S. I just wrote an overly long-winded technical response to a previous
poster who properly pointed out my statement was not really an improvement
on yours.


Re: [RD] evilnumbers update & changes

2005-03-14 Thread Martin Hepworth
Matt
myrdj not downloading the files as it can't get the file sizes for some 
reason...

--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300
Matt Yackley wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi all,
I've released a new version of evilnumbers and there are several changes in the 
new
version.
Ruleset name change:
In order to get this old setup in line with current SARE standards the name of 
the
ruleset has changed from evilnumbers.cf to 70_sare_evilnum*.cf
Multiple files:
The set has been spilt into three different files..
70_sare_evilnum0.cf = hits 0 ham during SARE masschecks
70_sare_evilnum1.cf = hits a few ham, but most folks consider these messages 
spam
70_sare_evilnum2.cf = hit 0 spam & ham during last masscheck, but may come back
RulesDuJour:
A new version of RDJ will be released soon to handle these changes, but here is 
a
manual fix.
In your RDJ or MyRDJ config file locate the evilnumbers entry and change the
following lines.
ADD = OLD_CF_FILES[8]="evilnumbers.cf"
CHANGE = CF_FILES[8]="70_sare_evilnum0.cf"
CHANGE = CF_URLS[8]="http://www.rulesemporium.com/rules/70_sare_evilnum0.cf";
Info on adding files 1 & 2 to RDJ
http://www.exit0.us/index.php?pagename=RulesDuJourRuleSets
Language files:
If you use a local language file 98_text_**_evilnumbers.cf, please delete this 
file.
 The structure of the rules may change soon, if/when that happens I'll release
updated language files.
Cheers,
matt



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQFCNITmjzAeShEp8NMRAkS2AJ9O3Wvt4qvc5BmRlKh1fFmxJP+/WACfQch7
gSpphFJ7593ULRK4L79hnck=
=ECdQ
-END PGP SIGNATURE-
**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.   
**


Re: MRTG SPAM SYSLOG ?

2005-03-14 Thread sa-users
Am 14.03.2005 um 07:54 Uhr haben Sie geschrieben:
> hi all
>
> is anyone using a tool that can parse "/var/log/messages" to find
> identified SPAM and is able to then build MTRG graphs ?


I use Mailgraph
http://people.ee.ethz.ch/~dws/software/mailgraph/

it uses the RRD to store the data.

Lars




Re: Upgrade... + other (perl?) problems

2005-03-14 Thread sa-users
I upgraded from 2.64 to 3.0.2 via CPAN with no problems.

Yes, I read
http://spamassassin.apache.org/full/3.0.x/dist/UPGRADE
und did change my local.cf.

1) Now Mail don't get tagged as spam
==
/va/log/mail
Mar 14 12:34:33 ns spamd[27959]: connection from localhost.localdomain
[127.0.0.1] at port 43500
Mar 14 12:34:33 ns spamd[26710]: processing message
<[EMAIL PROTECTED]> for web321p1:104.
Mar 14 12:34:35 ns spamd[26710]: identified spam (11.2/5.0) for
web321p1:104 in 2.0 seconds, 2403 bytes.

But procmail cant't filter anything because there are no X-SPAM-Headers
in the mail.

var/log/procmail:

>From [EMAIL PROTECTED]  Mon Mar
14 12:34:33 2005
 Subject: Approved mortage rate
  Folder: /var/spool/mail/web321p1


2) wired logfile entries
Mar 13 01:16:18 ns spamd[28893]: processing message
<[EMAIL PROTECTED]> for web321p1:104.
Mar 13 01:16:20 ns spamd[28893]: Use of uninitialized value in
concatenation (.) or string at
/usr/lib/perl5/vendor_perl/5.8.3/Mail/SpamAssassin/NoMailAudit.pm line
184.
Mar 13 01:16:20 ns spamd[28893]: Use of uninitialized value in pattern
match (m//) at
/usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/PerMsgStatus.pm line
875.
Mar 13 01:16:20 ns spamd[28893]: clean message (4.6/5.0) for
web321p1:104 in 1.7 seconds, 4293 bytes.

This does not happen with every message.

Any suggestions?


Am 14.03.2005 um 05:12 Uhr haben Sie geschrieben:
> I upgraded to 3.02 and everything is working fine except for one
thing... I
> always put spam from all users into the /var/mail directory with the
> following lines in procmailrc
>
> :0:
> * ^X-Spam-Level: \*\*\*\*\*\*\*\*\*
> /var/mail/spam
>
> After reading the new doc, I also added this line to the top of
procmailrc...
>
> DROPPRIVS=yes
>




RE: save debug output and errors

2005-03-14 Thread Ben Wylie
> > where should I look for this to fix?
>
> Theo has the answer for that one - a bad meta rule somewhere.
>
> Given that --lint seemingly didn't catch it, and this pure perl error is
> less than descriptive of the problem, once you find the rule in question,
> I think it deserves a Bugzilla entry for failure to check sufficiently for
> bad meta rules in lint.  You will need to supply the bad meta rule in 
> question, and possibly the rules it depends on.
>
> If you don't have a Bugzilla account I'll enter it for you.

Thanks again for your help.
I found that the error was in a ruleset I took from:
http://www.rulesemporium.com/rules/70_sare_oem.cf

However I was a bit stupid and overwrote the initial set with a fresh one
from that link, and the error has gone away. So I'm afraid I have lost the
file which had the error, so am not in a position to find the problematic
rule. Sorry about that.

Having managed to capture the errors which pop up when my mailserver runs
the spamassassin batch script, it is coming up with two errors:

Argument "BODY" isn't numeric in addition (+) at
F:\Perl\site\lib/Mail/SpamAssassin/Conf.pm line 244.

bayes: bayes db version 2 is not able to be used, aborting! at
F:\Perl\site\lib/Mail/SpamAssassin/BayesStore/DBM.pm line 160.

When I run the exact same batch script by hand, I only get the first error,
and the bayes function works.

Does this "bayes db version 2 is not able to be used" error mean something
specific or is it just a general bayes error?

Thanks
Ben





Re: MRTG SPAM SYSLOG ?

2005-03-14 Thread Ed Kasky
At 10:54 PM Sunday, 3/13/2005, ip.guy wrote -=>
is anyone using a tool that can parse "/var/log/messages" to find 
identified SPAM and is able to then build MTRG graphs ?

i was using a tool that could do this a while ago but have totally 
forgotten the name of the project

any help appreciated
Have a look at http://users.2z.net/rpuhek/scripts_public/spamd/
. . . . . . . . . . . . . . . . . .
Randomly Generated Quote (428 of 475):
The secret of life is honesty and fair dealing.
If you can fake that, you've got it made.   -Groucho Marx


Message Processing Platform (MPP) Free Edition launched

2005-03-14 Thread Rob Kudyba
Message Partners has launched a free edition of our Message Processing 
Platform (MPP) for those who want to use SpamAssassin and ClamAV with
Sendmail, QMail, Postfix, CGPro or SurgeMail email servers.  MPP Free 
Edition includes our comprehensive Webmin module, most of the features 
of MPP-LE and most of the security and utility features of MPP-LE 
including adding disclaimers, blocking extensions, quarantine 
management, log monitor,
file size limitations, mime error handling, many security checks, and 
more.  MPP Free Edition is tons faster than many of the perl interfaces
to SpamAssassin or ClamAV and will greatly reduce the load on your 
servers if you are using perl tools for this purpose and you have a
heavily loaded email server.   MPP Free Edition comes with mailing list 
support only, [EMAIL PROTECTED], and zero warranty.  We are starting
with Linux, FreeBSD, and OS X support.

Download and give it a try at
http://messagepartners.com/products/mpp_free_edition.html


Re: Bayes DB does not grow anymore

2005-03-14 Thread Kai Schaetzl
GRP Productions wrote on Mon, 14 Mar 2005 03:41:40 +0200:

> Indeed, this is the CVS version :-) 

I have been trying to get something from CVS for several days now, no luck.

> This is perhaps because I have been using only 'mistake-based' training (ie 
> training only when false classificaiton happens). However this used to work 
> fine. 

Bayes needs constant training, but this doesn't mean it needs any manual 
training. Once it's up and running and "well-greased" it should take care of 
itself by auto-learning (bayes_auto_learn 1, don't know if on by default). 
About 70 or 80% of our spam and ham (especially the spam) is autolearned.

>  
> >your "hold time" is quite low, it's about a month. I think we haven tokens 
> >from 
> >even a year ago. That's maybe a bit too much, but I strongly suggest upping 
> >your bayes_expiry_max_db_size to something like 500.000 or so. Since you 
> >have a 
> >much higher flux of messages than we have on that machine you are literally 
> >"burning" your db to uselessness. 
>  
> So what would you suggest? I certainly dont want to lose everything that has 
> been learned till now. 

Actually, with those "few" tokens you won't loose much if you throw it away ;-) 
As I said upping that should help, no need to throw it away unless you think 
that's easier (if most spam you get scores at BAYES_50 it might be better to 
start over than to convince the db that it's spam).

> Nope, there is definitely only the one comng with MS. I never use SA from 
> the command line anyway.

Well, let's go back:
you sa-learn a message, it says it learned, you dump magic and see there's no 
change, you look in the directory and there's no journal. There *has* to be at 
least one additional Bayes db. Or something happens which I haven't heard of in 
my about three years of using SA+Bayes. What's the output of "sa-learn --dump 
magic"? Don't specify a config file!
 
> bayes_path  /var/spool/MailScanner/bayes/bayes 

and what's in your /etc/mail/spamassassin/local.conf?

> bayes_auto_expire 0
ok, that means it won't expire. Of course, if it doesn't grow this isn't 
necessary ... ;-)

> bayes_expiry_max_db_size 50
I assume you just added>/changed that?

> If I get it you mean that the tokens are lost very quickly?

Yes. However, now that I know that your bayes_expiry is off we have a different 
case? Since when has it been off? Since Feb. 11 as your dump magic suggests? 
Your oldest token is Feb. 2. So that either means your started the db that day 
or you are burning your tokens in 10 days. That's one problem, upping to a 
higher ceiling, as you already did, should take care of that. The other problem 
is that it's apparently not growing. One of the reasons is, of course, that you 
only learn by mistake. So, how often is that done? How many do you actually add 
this way? The second part of this other problem is that even if you learn it 
doesn't seem to learn. I don't see another possibility as that it uses 
different dbs.

 I think am 
> confused , if bayes works with tokens, why does it need nspam and nham? Or 
> are they just counters? 

It's just the number of spam and ham messages you learned to it. Yes, it's more 
or less informational only.

>  
> In general, do you think that setting bayes_expiry_max_db_size would be 
> enough? 

To cure the fast expiration, yes, but you didn't expire for the last 30 days, 
anyway.

> One final thing: Why even if i manually expire, the date of last expiration 
> remains old?

Same reason as above: you work on different dbs. What does the expire output 
show?


Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org





Re: [SURBL-Discuss] Fw: TKO Notice: Urgent Fraud Investigation

2005-03-14 Thread Jeff Chan
On Thursday, February 17, 2005, 4:46:28 PM, Jeff Chan wrote:
> IMO The correct answer is for eBay not to have an open redirector
> or for them to protect it better, for example as Matthew suggests.

> We could ask them follow the lead of other redirection sites and
> use SURBLs to check the URIs:

>   http://www.surbl.org/redirect.html

> Jeff C.

Kevin McGrail reports that eBay has closed their open redirector.

Jeff C.
--
"If it appears in hams, then don't list it."



Re: MRTG SPAM SYSLOG ?

2005-03-14 Thread Matias Lopez Bergero
[EMAIL PROTECTED] wrote:
Am 14.03.2005 um 07:54 Uhr haben Sie geschrieben:
hi all
is anyone using a tool that can parse "/var/log/messages" to find
identified SPAM and is able to then build MTRG graphs ?

I use Mailgraph
http://people.ee.ethz.ch/~dws/software/mailgraph/
I use Mailgraph to.
Works very good. The only thing that isn't working is the bounced 
graphics. Don't know why.

BR,
Matías.



feeding bayes

2005-03-14 Thread ChupaCabra
I just noticed that some of my users are attaching multiple spams in one 
email to my "spam" user for training.  Aside from having to have 200 
spam/ham mails and the possibiliy of the messages being too large, would 
this have an effect on training?

Thanks.
--
Michael H. Collins  Admiral, Penguinista Navy
http://linuxlink.com
/"\ASCII Ribbon Campaign
\ / No HTML/RTF in email
x   No Word docs in email
/ \ Respect for open standards
In a related story, the IRS has recently ruled that 
the cost of Windows upgrades can NOT be deducted 
as a gambling loss.




Re: [RD] evilnumbers update & changes

2005-03-14 Thread Chris Thielen
Matt Yackley wrote:
Hi all,
I've released a new version of evilnumbers and there are several 
changes in the new
version.



RulesDuJour:
A new version of RDJ will be released soon to handle these changes, 
but here is a
manual fix.

I've updated RDJ with the new names for evilnumbers.  There are now 
three names available, EVILNUMBERS, EVILNUMBERS1, and EVILNUMBERS2.   
RDJ users should receive RDJ 1.19 during the next update (as usual, RDJ 
does not automatically update itself, only download the new version for 
you).


Cheers,
matt






signature.asc
Description: OpenPGP digital signature


Re: [RD] evilnumbers update & changes

2005-03-14 Thread Chris Thielen
Hi Martin,
Martin Hepworth wrote:
 Matt
 myrdj not downloading the files as it can't get the file sizes for 
some reason...
Can you give me the error messages?  I just downloaded the new 
evilnumbers using RDJ 1.19 (which I just uploaded) and it went off 
without a hitch.  I use curl (not wget), by the way.



signature.asc
Description: OpenPGP digital signature


bayesian tokens in text format?

2005-03-14 Thread Paul Reilly

Is it possible to dump the bayesian tokens in
human readable format still? It was quite useful
but since 3.0.x they seen to be base64 encoded or
some other way encoded. I couldn't see any sa-learn
option, or any FAQ entry about it.
Thanks
Paul



Re: feeding bayes

2005-03-14 Thread Matt Kettler
At 11:22 AM 3/14/2005, ChupaCabra wrote:
I just noticed that some of my users are attaching multiple spams in one 
email to my "spam" user for training.  Aside from having to have 200 
spam/ham mails and the possibiliy of the messages being too large, would 
this have an effect on training?
Well, first, if you aren't stripping the attachments and training those, 
your whole system is flawed regardless of how many spams they send.

You cannot train on messages forwarded by or sent by one of your users. 
SA's bayes engine learns message headers, so you MUST strip off their 
message prior to training or SA will learn that your users are sources of 
spam. Not a good thing.

sa-learn really needs original messages, with original headers.
Now, if you're stripping attachments and training the attachments, it 
really shouldn't matter how many they attach, unless your script that 
strips them is broken...





Feeding Bayes aswell

2005-03-14 Thread Andreas Rust
Hello,
we are mostly using Eudora (Windows version) and it's saving "Junk" emails 
(as junked by Eudora itself) into
an .mbx file.
Eventhough the .mbx file looks like an ok-formatted mbox file, it carries:

From [EMAIL PROTECTED] Tue Jan 18 15:28:02 2005
Infront of the normal headers.
Such as:
From [EMAIL PROTECTED] Tue Jan 18 15:28:02 2005   --- the added line
Return-path: <[EMAIL PROTECTED]>   --- the header as we normally 
expect it ...
Envelope-to: etcetcetcetc

If I feed that to spamassassin, does that influence scoring in some way or 
would it be ignored completely?
(Or even raise a problem ? :) )
Afterall that is in no way a valid header line.

thx for any pointers
Andreas Rust -   webnova GmbH
[EMAIL PROTECTED]  -   www.webnova.de
Tel:  +49 (0)700 - 20 30 7000
Fax:  +49 (0)700 - 20 30 8000
+:--:+
 www.Synergien-Nutzen.de
 Gemeinsam sind wir stark...


Re: bayesian tokens in text format?

2005-03-14 Thread Matt Kettler
At 11:46 AM 3/14/2005, Paul Reilly wrote:
Is it possible to dump the bayesian tokens in
human readable format still?
No.
In sa 3.0+ they are base-64 encodings of the SHA1 hash of the token. The 
hash is for all practical purposes not reversible.

This is done in part for privacy.. examining the text of a bayes DB tells a 
lot about the email someone receives.. Examining the sha1 hashes tells you 
little about their email.

It's also done for speed... SHA1 hashes are fixed size, which helps 
optimize access to the database.

However, it does have the drawback of making it difficult to inspect the 
database manually for weirdness, but really, 9 times out of 10, this 
inspection tends to lead to monkeying with the bayes DB more than you should.

Now, if you have a specific message that's a problem, running it through 
spamassassin -D will show the tokens matched in text format. So, if you 
have a specific problem, you can still use this to diagnose what's going on.



Re: [RD] evilnumbers update & changes

2005-03-14 Thread Martin Hepworth
Chris
updated the rules_du_jour, removed the new defs from myrdj and things 
work now.

--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300
Chris Thielen wrote:
Hi Martin,
Martin Hepworth wrote:
 Matt
 myrdj not downloading the files as it can't get the file sizes for 
some reason...
Can you give me the error messages?  I just downloaded the new 
evilnumbers using RDJ 1.19 (which I just uploaded) and it went off 
without a hitch.  I use curl (not wget), by the way.

**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.   
**


Re: Feeding Bayes aswell

2005-03-14 Thread Matt Kettler
At 11:52 AM 3/14/2005, Andreas Rust wrote:
we are mostly using Eudora (Windows version) and it's saving "Junk" emails 
(as junked by Eudora itself) into
an .mbx file.


You can't train SA on most messages from Eudora's .mbx files. Speaking as a 
user of eudora, eudora completely destroys many important parts of a 
message when it stores it in the mbox, and it cannot be reconstructed.

The biggest problem is that Eudora mangles all multipart messages in a 
non-reversible manner. It only ever saves one mime section in the mbox, and 
strips or discards, everything else. For example in Multipart/alternative 
messages the plain segment is discarded and the html segment is saved. That 
plain segment is gone, and is saved anywhere. Anything with embedded images 
has the embedded images stripped out and saved as separate files. Ditto for 
attachments.

The only messages you can reconstruct are single-part text/plain or 
text/html messages. For those, you can use a script that converts Eudora 
mbx format into standard unix mbox format, such as eudora2unix.pl.



Re: feeding bayes

2005-03-14 Thread ChupaCabra
Thanks.  Glad I havn't started feeding it yet.  So what I need to do is
look at the attachments and then save then to a file then train on that.
correct?  That makes sense.  Maybe that is why my old SA instance got
kinda weird in the end.
Thanks
Matt Kettler wrote:
Well, first, if you aren't stripping the attachments and training 
those, your whole system is flawed regardless of how many spams they 
send.

You cannot train on messages forwarded by or sent by one of your 
users. SA's bayes engine learns message headers, so you MUST strip off 
their message prior to training or SA will learn that your users are 
sources of spam. Not a good thing.

sa-learn really needs original messages, with original headers.
Now, if you're stripping attachments and training the attachments, it 
really shouldn't matter how many they attach, unless your script that 
strips them is broken...




plugins and parrallelization

2005-03-14 Thread Eric A. Hall

It seems that the plugin architecture only allows a single pass/fail
result, so if you want to have multiple tests with different shades of
results, you have to call the plugin multiple times. Is that right?

Over the weekend I banged together a preliminary ldapBlacklist.pm plugin
which lets the master process query an ldap server for whitelist or
blacklist flags associated with the connecting SMTP client's reverse DNS,
the HELO identifer, the mail-from address, the From address, and so
forth... The problem is that each of these tests have to do a fair amount
of processing with some significant serialization (ie, DNS lookup for SRV
RRs, DNS lookup for ldap server, connect->bind->query the server, as well
as the rest of the background code. Using the pass/fail model as a
front-end to this system, each test basically has to be its own rule, and
each rule has to call its own eval() in order for each rule to use its
defined weighting (eg, -50 for whitelisted, +50 for blacklisted, on a
per-test basisc. But in that model, the core LDAP stuff has to be run ~six
times to process ~six tests, and that's a significant serialization
penalty in sum, just to find out if one of the sending domains is listed
as blacklisted or whitelisted in a local LDAP server. It's so bad that I'm
not sure it's feasible to do this.

What are the thoughts?

-- 
Eric A. Hallhttp://www.ehsco.com/
Internet Core Protocols  http://www.oreilly.com/catalog/coreprot/


Re: plugins and parrallelization

2005-03-14 Thread Daryl C. W. O'Shea
Eric A. Hall wrote:
It seems that the plugin architecture only allows a single pass/fail
result, so if you want to have multiple tests with different shades of
results, you have to call the plugin multiple times. Is that right?
Over the weekend I banged together a preliminary ldapBlacklist.pm plugin
which lets the master process query an ldap server for whitelist or
blacklist flags associated with the connecting SMTP client's reverse DNS,
the HELO identifer, the mail-from address, the From address, and so
forth... The problem is that each of these tests have to do a fair amount
of processing with some significant serialization (ie, DNS lookup for SRV
RRs, DNS lookup for ldap server, connect->bind->query the server, as well
as the rest of the background code. Using the pass/fail model as a
front-end to this system, each test basically has to be its own rule, and
each rule has to call its own eval() in order for each rule to use its
defined weighting (eg, -50 for whitelisted, +50 for blacklisted, on a
per-test basisc. But in that model, the core LDAP stuff has to be run ~six
times to process ~six tests, and that's a significant serialization
penalty in sum, just to find out if one of the sending domains is listed
as blacklisted or whitelisted in a local LDAP server. It's so bad that I'm
not sure it's feasible to do this.
What are the thoughts?
There's no need to do your lookups more than once.  Save the results the 
first time you do the lookups/processing.

See the SPF.pm plugin for a good example.
Daryl


Re: Feeding Bayes aswell

2005-03-14 Thread Kelson
Matt Kettler wrote:
You can't train SA on most messages from Eudora's .mbx files. Speaking 
as a user of eudora, eudora completely destroys many important parts of 
a message when it stores it in the mbox, and it cannot be reconstructed.
And this was the main reason that after using Eudora for 8 years, I 
finally switched to Thunderbird.

The import process from Eudora to Thunderbird works pretty well, though 
it obviously can't restore information that isn't there. 
multipart/alternative is, of course, toast, though it does a reasonable 
job of re-attaching attachments (though the original mime 
characteristics are long gone).  I had 4 years of mail to test it with, 
and found a lot of bugs for them to fix in the pre-1.0 days!

I think my favorite Eudora craziness was the fact that outgoing mail 
with signatures is stored as HTML, even if you wrote it as plain text, 
but isn't labeled as HTML.

--
Kelson Vibber
SpeedGate Communications 


Re: bayesian tokens in text format?

2005-03-14 Thread Michael Parker
On Mon, Mar 14, 2005 at 04:46:06PM +, Paul Reilly wrote:
> 
> Is it possible to dump the bayesian tokens in
> human readable format still? It was quite useful
> but since 3.0.x they seen to be base64 encoded or
> some other way encoded. I couldn't see any sa-learn
> option, or any FAQ entry about it.

To expand a bit on what Matt said.

In general, no it's not possible to dump the bayesian tokens in a
readable (well they are readable, it's just hard to read them :))
format, unless you do a little work yourself.  It is possible to dump
them by making use the the given plugin hooks that allow you to fetch
the "raw" token value and match it to the SHA1 hash for the token.

FYI, the values you can see, via a --dump or --backup, are actually
hex representations of the binary SHA1 data.

The primary motivation for the change was indeed speed, and let me
tell you it was a lot.  Privacy never really entered into the picture,
although I suppose it is a nice side effect, except that with a plugin
it's pretty easy to map the token values.

I know, the next thing you're going to ask is how do I write a plugin
to do this, well, that is an exercise to the reader.  I did a proof of
concept back when I added the plugin hooks, and may have sent it to
the mailing list so check the archives.  For all the juicy details
check out the comments in this bug:
http://bugzilla.spamassassin.org/show_bug.cgi?id=3331

Of course, I have to ask, how do you find the data "quite useful?"  I
asked on the mailing list several times for examples of how people
might use that data and nothing came along that was very compelling,
at least enough for me to pursue a better more integrated fix.

Michael


pgpmsJsR0OPCd.pgp
Description: PGP signature


Re: plugins and parrallelization

2005-03-14 Thread Eric A. Hall

On 3/14/2005 12:32 PM, Daryl C. W. O'Shea wrote:

> There's no need to do your lookups more than once.  Save the results the 
> first time you do the lookups/processing.
> 
> See the SPF.pm plugin for a good example.

Oh right, it's persistent. Duh.

Thanks


-- 
Eric A. Hallhttp://www.ehsco.com/
Internet Core Protocols  http://www.oreilly.com/catalog/coreprot/


Re: Razor Files Missing

2005-03-14 Thread Norman Zhang
Matt Florido wrote:
Here's a good article for ports needed by Razor/Pyzor/DCC.
http://wiki.apache.org/spamassassin/NetTestFirewallIssues
Thanks for the URL. I'm not too faimiliar with IPTables. I just allow 
outgoing UPD/6277 for DCC and outgoing TCP/2703 for Razor. I don't have 
any DCC server setup. Is this sufficient?

Does it also need Pyzor?  If so, install: pyzor.noarch
I don't have Pyzor installed yet. One more question, DCC, Razor and 
Pyzor does not require learning? Only SA requires learning?

Regards,
Norman Zhang


Re: bayesian tokens in text format?

2005-03-14 Thread Matt Kettler
At 01:11 PM 3/14/2005, Michael Parker wrote:
In general, no it's not possible to dump the bayesian tokens in a
readable (well they are readable, it's just hard to read them :))
format, unless you do a little work yourself.  It is possible to dump
them by making use the the given plugin hooks that allow you to fetch
the "raw" token value and match it to the SHA1 hash for the token.
True, however, just given a bayes DB in 3.0's normal format, you can't dump 
it in text format. The plugin would have to have been running while the 
bayes DB was created.


The primary motivation for the change was indeed speed, and let me
tell you it was a lot.  Privacy never really entered into the picture,
although I suppose it is a nice side effect, except that with a plugin
it's pretty easy to map the token values.
True. I guess I mis-represented a desirable-to-some side effect as a reason 
for implementation. Speed was the big motivator.

Of course, I have to ask, how do you find the data "quite useful?"
It's "quite useful" as dumping the bayes db through sort and looking at the 
tokens helps you identify tokens to look for that may be in misclassified 
messages.

ie: if I see an obfuscated Viagra variant with stats like" 0 spam 1 ham 
0.000", I know to go dig around in my archives for a misclassified message 
containing that word and re-train it properly.

However, as I said before, 9 times out of 10 doing this leads to people 
over-manipulating their bayes DB by deciding that a particular token "must 
be" spam or nonspam, and doing things like creating bogus messages to shift 
the training the way they want it. A lot of admins get really worried about 
one or two tokens that don't "look right"... Which is a bad thing.





Re: plugins and parrallelization

2005-03-14 Thread Justin Mason
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Eric A. Hall writes:
> On 3/14/2005 12:32 PM, Daryl C. W. O'Shea wrote:
> 
> > There's no need to do your lookups more than once.  Save the results the 
> > first time you do the lookups/processing.
> > 
> > See the SPF.pm plugin for a good example.
> 
> Oh right, it's persistent. Duh.

yeah -- as discussed in the Plugin pod docs, the life-cycle of the objects
you have access to there is:

- PerMsgStatus object: persistent throughout a single message's
  scanning, so good for this case.

- Conf object: global and always available, so good for permanent
  state.  not good for state that may change between users though; to
  deal with that, use the signal_user_changed() hook if necessary.

Those are the two main places to store data if needed.

- --j.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFCNd29MJF5cimLx9ARAvRSAJ41g19rPg9NmR4s0i/4120zXZumbwCeNPw9
oCrvOJPfGYJ7na6TeITSYbU=
=SaR5
-END PGP SIGNATURE-



Re: Windows with ESA - statistics?

2005-03-14 Thread Tim P
Ok so how would I use this with all of my .out files.  When I run it
by itself it prints Date Status Score
but nothing else even when in the folder with all of the .out files. 
When I attempt to give it a file to parse "reportparse.pl *.out) it
gives me the below errors

C:\ESA>reportparse.pl
DateStatus  Score

C:\ESA>reportparse.pl c:\esa\ham\
DateStatus  Score
Can't do inplace edit: c:\esa\ham\ is not a regular file at C:\ESA\reportparse.p
l line 2.

C:\ESA>reportparse.pl c:\esa\ham\*
DateStatus  Score
Can't open c:\esa\ham\*: Invalid argument at C:\ESA\reportparse.pl line 2.

C:\ESA\Ham>reportparse.pl *
DateStatus  Score
Can't open *: Invalid argument at C:\ESA\Ham\reportparse.pl line 2.


On Wed, 9 Mar 2005 11:32:47 -0500, Bowie Bailey <[EMAIL PROTECTED]> wrote:
> From: Tim P [mailto:[EMAIL PROTECTED]
> >
> > I am using spamassasin on windows with the ESA message sync (
> > http://www.christopherlewis.com/ExchangeSpamAssassin.htm ) and
> > would like to have some statistics so that I can better tune the
> > spam assassin (particularly with the minimum spam setting and the
> > purge on discovery setting).
> >
> > As perl is already installed what I really need is a perl script
> > that can go through the messages that the ESA dumps out (.out
> > files that can be read by notepad), grab the following lines and
> > dump them into a file format (like a csv or a delimited text
> > file) so that I can use something to give me statistics.
> >
> > X-Spam-Status: Yes, score=28.9
> > Date: Tue, 4 Jan 2005 06:36:10 -0700
> >
> > I would really like to be able to give statistics by date (daily,
> > weekly, monthly), averages, numbers of spams/hams for a time
> > period.  Most of that can be done using excel if I have to but
> > its getting that info into a readable fileformat that is beyond
> > me.  Anyone out there know how to do something like that?
> 
> Parsing the file and producing delimited text that can be imported
> into Excel is simple with Perl.
> 
> Assuming that each email scanned will produce both of the lines you
> listed and assuming that those entries are on their own lines and
> not part of a longer line, this should work for you (untested):
> 
> print "Date\tStatus\tScore\n";
> while (<>)
> {
>if (/^X-Spam-Status: (Yes|No), score=([0-9.]+)/)
>{
>$status = $1;
>$score = $2;
>}
>if (/^Date: (.*) [-+]\d{5}$/)
>{
>$date = $1;
>}
>if ($status and $score and $date)
>{
>print "$date\t$status\t$score\n";
>undef $date;
>undef $status;
>undef $score;
>}
> }
> 
> Output would be tab separated like this (formatted for readability):
> 
> DateStatusScore
> Tue, 4 Jan 2005 06:36:10Yes   28.9
> 
> Bowie
>


2 pops

2005-03-14 Thread S M.C Butler

Hi, I would like to have my mail forwarded to my ISP's account and then
popped to my server where I can run spam assassin and finally popped a
second time to my PC. How do I get this 2-level pop mechanism going? How can
I pop from my ISP account to my server in a way that will allow me to do a
second pop from /var/mail/username to my pc

 Thx in advance.


>-Original Message-
>From: GRP Productions [mailto:[EMAIL PROTECTED]
>Sent: Sunday, March 13, 2005 2:33 PM
>To: users@spamassassin.apache.org
>Subject: Re: Bayes DB does not grow anymore
>
>>That is the output of --dump magic? I haven't ever seen it formatted that
>>nicely. I assume you skipped the first line, but there's also missing the
>>expire atime delta. So, where do you got this from? Not directly from
>>sa-learn
>>--dump magic I'd say. You are running SA thru some interface? You should
>>have
>>said something about the whereabouts of your installation.
>
>You are right, I am using MailWatch. I just posted this output to be easy
>for one to see the actual dates without having to convert. Here is the
>actual output:
>
># /usr/bin/sa-learn -p /opt/MailScanner/etc/spam.assassin.prefs.conf --dump
>magic
>0.000  0  3  0  non-token data: bayes db version
>0.000  0  49740  0  non-token data: nspam
>0.000  0  47167  0  non-token data: nham
>0.000  0 123325  0  non-token data: ntokens
>0.000  0 1107319073  0  non-token data: oldest atime
>0.000  0 1110636450  0  non-token data: newest atime
>0.000  0 1108137790  0  non-token data: last journal sync
>atime
>0.000  0 1108129534  0  non-token data: last expiry atime
>0.000  0 804361  0  non-token data: last expire atime
>delta
>0.000  0   3475  0  non-token data: last expire
>reduction count
>
>>Ok. Get the values. Then learn a message to it. Make sure it says that it
>>actually learned, then check the values again. Is either the spam or ham
>>count
>>increased by one or not?
>
>No it isn't. This is exactly the point I mentioned. But as I said earlier,
>sa-learn claims it has learned, even from the web interface:
>>SA Learn: Learned from 1 message(s) (1 message(s) examined).
>
>>Ok, this finally looks a bit suspicious. No sync and no expire for a
>month.
>>If
>>it doesn't sync you don't get new tokens. Check in your bayes directory
>how
>>big
>>your bayes_journal is. I'd think it's quite big. Do a sync now. (Please
>>don't
>>do it via an interface, do it on the command line.) What's the output? Is
>>the
>>journal gone and the number of tokens increased now? If so, you need to
>>investigate why it doesn't sync anymore. Also do an expire then.
>
>This is getting more suspicious: there is no bayes_journal file!
>
># ll /var/spool/MailScanner/bayes/
>total 11780
>drwxrwxrwx  2 root nobody 4096 Mar 14 00:22 .
>drwxr-xr-x  4 root nobody 4096 Mar 13 11:55 ..
>-rw-rw-rw-  1 root nobody 1236 Mar 14 00:22 bayes.mutex
>-rw-rw-rw-  1 root nobody 10452992 Mar 14 00:22 bayes_seen
>-rw-rw-rw-  1 root nobody  5509120 Mar 14 00:02 bayes_toks
>
>I can assure you noone has touched anything inside this directory. If this
>is the reason for the problems I've been facing, is there a way to recreate
>the file without having to lose my current data? (perhaps by copying the
>above files somewhere, execute sa-learn --clear and some time later restore
>the above files?)
>
>Thanks for your help
>
>_
>Express yourself instantly with MSN Messenger! Download today it's FREE!
>http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



Re: 2 pops

2005-03-14 Thread Matt Kettler
At 02:20 PM 3/14/2005, S M.C Butler wrote:
Hi, I would like to have my mail forwarded to my ISP's account and then
popped to my server where I can run spam assassin and finally popped a
second time to my PC. How do I get this 2-level pop mechanism going? How can
I pop from my ISP account to my server in a way that will allow me to do a
second pop from /var/mail/username to my pc
 Thx in advance.

1) what does this have to do with the thread "Re: Bayes DB does not grow 
anymore"?

2) man fetchmail 



Multiple local configurations

2005-03-14 Thread Russell P. Sutherland
Is it possible to use the spamc/spamd pair
by invoking spamc with a per instance refernence
to a list of customized local.cf files?

I would like to have different rules on a per
destination domain basis. Is this possible?

-- 
Quist ConsultingEmail: [EMAIL PROTECTED]
219 Donlea DriveVoice: +1.416.696.7600
Toronto ON  M4G 2N1 Cell:  +1.416.803.0080
CANADA  WWW:   http://www.quist.ca


RE: 2 pops

2005-03-14 Thread S M.C Butler

>
>1) what does this have to do with the thread "Re: Bayes DB does not grow
>anymore"?
>

oops I replied to that mail to get the mailing list address and forgot to
delete the inline text, sorry about that.

>2) man fetchmail

thx, I'll check it out.



Re: Multiple local configurations

2005-03-14 Thread Matt Kettler
At 02:50 PM 3/14/2005, Russell P. Sutherland wrote:
Is it possible to use the spamc/spamd pair
by invoking spamc with a per instance refernence
to a list of customized local.cf files?
No, local.cf level files are parsed only at the start of spamd. By the time 
spamc is called, it's too late.


I would like to have different rules on a per
destination domain basis. Is this possible?
You have 2 options:
1) run 2 spamds, each with different listening ports specified with -p, and 
different --siteconfigpath's. Select which one you use by what you pass to 
-p on the spamc command line.

Advantage: easy, and still allows per-user customization.
Drawback: doesn't scale well, wastes memory
2) if you don't have per-user customization, you could create 2 user 
accounts and pass the appropriate username to spamc -u. You'd also need 
"allow_user_rules 1" in your local.cf, which is not a risk if no untrusted 
users have local accounts that can log in.




RE: 2 pops

2005-03-14 Thread Matt Kettler
At 02:53 PM 3/14/2005, S M.C Butler wrote:
>1) what does this have to do with the thread "Re: Bayes DB does not grow
>anymore"?
>
oops I replied to that mail to get the mailing list address and forgot to
delete the inline text, sorry about that.
Even if you did remove the inline text, it's still going to show up as a 
reply to that thread... The "In-Reply-To:" header will give you away.. From 
your original post:

In-Reply-To: <[EMAIL PROTECTED]>
Based on this, any threading mail readers and list archives will burry your 
post as a reply, rather than showing it as a new thread.

Take a look at the GMANE archives for an example of threading:
http://news.gmane.org/gmane.mail.spam.spamassassin.general
The big difference in a mail client is that threading mail clients 
generally allow you to collapse threads and you don't see the posts under 
them when collapsed.

When posting a new thread it's really in your best interest to just create 
a new message, and not try to hijack a reply into being something it's not. 



Bayes not Available

2005-03-14 Thread Norman Zhang
Hi,
When I run amavisd debug-sa, I see the following. Is my SA bayes 
misconfigured? May I ask for a few pointers?

Regards,
Norman Zhang
debug: bayes: Not available for scanning, only 0 spam(s) in Bayes DB < 200
required_hits 5
rewrite_header Subject [SPAM]
report_safe 0
skip_rbl_checks 0
bayes_path /var/lib/amavis/var/.spamassassin
use_bayes 1
use_razor2 1
use_dcc 1
use_pyzor 1
dns_available yes
auto_whitelist_path /var/spool/spamassassin/auto-whitelist
auto_whitelist_file_mode0666
dcc_dccifd_path /var/lib/dcc/dccifd
dcc_path/usr/bin
dcc_home/var/lib/dcc
score DCC_CHECK 4.000
score SPF_FAIL 10.000
score SPF_HELO_FAIL 10.000
score RAZOR2_CHECK 2.500
score BAYES_99 4.300
score BAYES_90 3.500
score BAYES_80 3.000


RE: Bayes not Available

2005-03-14 Thread Matthew.van.Eerde
Norman Zhang wrote:
> Hi,
> 
> When I run amavisd debug-sa, I see the following. Is my SA bayes
> misconfigured? May I ask for a few pointers?
> bayes_path /var/lib/amavis/var/.spamassassin

Should be
/var/lib/amavis/var/.spamassassin/bayes

surely (the final "/bayes" is not a path but rather a filename prefix)

Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902
Hispanic Business Inc./HireDiversity.com Software Engineer
perl -e"map{y/a-z/l-za-k/;print}shift" "Jjhi pcdiwtg Ptga wprztg," 


Re: Bayes not Available

2005-03-14 Thread Norman Zhang
[EMAIL PROTECTED] wrote:
Should be
/var/lib/amavis/var/.spamassassin/bayes
surely (the final "/bayes" is not a path but rather a filename prefix)
I have changed
bayes_path /var/lib/amavis/var/.spamassassin/bayes
but I still see the "Not avilable for scanning". Am I missing something?
Regards,
Norman Zhang
debug: bayes: found bayes db version 3
debug: bayes: Not available for scanning, only 0 spam(s) in Bayes DB < 200
debug: bayes: 9566 untie-ing
debug: bayes: 9566 untie-ing db_toks
debug: bayes: 9566 untie-ing db_seen


RE: save debug output and errors

2005-03-14 Thread Ben Wylie
-Original Message-
From: Ben Wylie [mailto:[EMAIL PROTECTED] 
Sent: 14 March 2005 13:28
To: users@spamassassin.apache.org
Subject: RE: save debug output and errors
>
> Having managed to capture the errors which pop up when my mailserver runs
> the spamassassin batch script, it is coming up with two errors:
>
> Argument "BODY" isn't numeric in addition (+) at
> F:\Perl\site\lib/Mail/SpamAssassin/Conf.pm line 244.
>
> bayes: bayes db version 2 is not able to be used, aborting! at
> F:\Perl\site\lib/Mail/SpamAssassin/BayesStore/DBM.pm line 160.
>
> When I run the exact same batch script by hand, I only get the first
> error, and the bayes function works.
>
> Does this "bayes db version 2 is not able to be used" error mean something
> specific or is it just a general bayes error?

Following up on my own post.
I have discovered it is because my Mailserver is running as user SYSTEM. It
therefore also runs SA as SYSTEM, and looks for the bayes database in:
F:\Documents and Settings\NetworkService/.spamassassin/

Using the bayes_path option, I have now got it looking in the correct folder
and all is well.
Thanks for the help with getting the debug info.

Ben






Re: Bayes not Available

2005-03-14 Thread Andy Jezierski

Norman Zhang <[EMAIL PROTECTED]>
wrote on 03/14/2005 03:29:25 PM:

> debug: bayes: Not available for scanning, only 0 spam(s) in Bayes
DB < 200

You need to teach Bayes at least 200 spam and 200
non-spam messages before it will do anything for you.

Andy

Re: bayesian tokens in text format?

2005-03-14 Thread Paul Reilly

> Of course, I have to ask, how do you find the data "quite useful?"  I

It's useful to see what words/tokens are getting high scores.
The bayes database on one of my machines seems to be not
as accurate as the others, and results in msgs through that
machine are getting a negative bayes scoring. -1.7 etc
I wanted to see the tokens to see if I could see anything
unusual which might be causing this. But it's not a big issue.

Thanks all,

Paul



Re: bayesian tokens in text format?

2005-03-14 Thread Michael Parker
On Mon, Mar 14, 2005 at 10:23:37PM +, Paul Reilly wrote:
> 
> > Of course, I have to ask, how do you find the data "quite useful?"  I
> 
> It's useful to see what words/tokens are getting high scores.
> The bayes database on one of my machines seems to be not
> as accurate as the others, and results in msgs through that
> machine are getting a negative bayes scoring. -1.7 etc
> I wanted to see the tokens to see if I could see anything
> unusual which might be causing this. But it's not a big issue.
> 

You don't need to see all of the tokens in the database to see this.
There are several bayes based tags that can give you this sort of
information on a per msg basis. perldoc Mail::SpamAssassin::Conf

Michael


pgpTxkAHBCLhz.pgp
Description: PGP signature


Re: Bayes DB does not grow anymore

2005-03-14 Thread GRP Productions
I have been trying to get something from CVS for several days now, no luck.
Send me your email in private ([EMAIL PROTECTED]) to send it to you.
Bayes needs constant training, but this doesn't mean it needs any manual
training. Once it's up and running and "well-greased" it should take care 
of
itself by auto-learning (bayes_auto_learn 1, don't know if on by default).
About 70 or 80% of our spam and ham (especially the spam) is autolearned.
I will probably start again from scratch. One point: Do you think I should 
put custom rules inside /etc/mail/spamassassin or the default installation 
is enough?

Actually, with those "few" tokens you won't loose much if you throw it away 
;-)
As I said upping that should help, no need to throw it away unless you 
think
that's easier (if most spam you get scores at BAYES_50 it might be better 
to
start over than to convince the db that it's spam).
I'll probably do it.
> bayes_auto_expire 0
> bayes_expiry_max_db_size 50
I assume you just added>/changed that?
Yes I just added this. Should auto_expire remain always at 0? Also, do you 
think it would be better if the db NEVER expired? Would this value of 50 
achieve that? I don't want to come at work some day and see my tokens were 
lost again :-(

In general, should I do as you said, ie. trust the autolearn system and 
never use sa-learn again, provided that I do not have the time to do full 
training.

Thanks for giving me so much of your time, and being so patient with my 
silly questions.
Best regards,
Greg

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/