Re: Can someone explain how to read Bayes stats?

2019-11-27 Thread @lbutlr
On 27 Nov 2019, at 06:52, Anders Gustafsson  wrote:
> 0.000  0   3184  0  non-token data: nspam
> 0.000  0  17298  0  non-token data: nham

Plenty of spam and ham learned

> 0.000  0 1553643652  0  non-token data: oldest atime

Oldest data is from March

> 0.000  0 1574862537  0  non-token data: newest atime

Newest date from today

> I had SA running before, but hd to take a break because of upgrades. I have 
> not had the chance yet to collect over 200 SPAM/HAM messages for training.

You have, but chances are most of it is old. Still, that doesn’t mean useless.

You should see bales scores in incoming mail.


-- 
"Are you pondering what I'm pondering?"
"I think so, Brain, but Zero Mostel times anything will still give
you Zero Mostel.”



Re: Can someone explain how to read Bayes stats?

2019-11-27 Thread Matus UHLAR - fantomas

On 27.11.19 15:52, Anders Gustafsson wrote:

pamir:~ # sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0   3184  0  non-token data: nspam
0.000  0  17298  0  non-token data: nham
0.000  0 164549  0  non-token data: ntokens
0.000  0 1553643652  0  non-token data: oldest atime
0.000  0 1574862537  0  non-token data: newest atime
0.000  0 1574856320  0  non-token data: last journal sync atime
0.000  0 1574848041  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime delta
0.000  0  0  0  non-token data: last expire reduction 
count

I had SA running before, but hd to take a break because of upgrades. I have not 
had the chance yet to collect over 200 SPAM/HAM messages for training.


according to the info above, there was 3184 spams and 17298 hams learned,
both over limit.

bayes should hit, unless bayes has been turned off, or different account was
used for scanning.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Honk if you love peace and quiet.


Can someone explain how to read Bayes stats?

2019-11-27 Thread Anders Gustafsson
Ie:


pamir:~ # sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0   3184  0  non-token data: nspam
0.000  0  17298  0  non-token data: nham
0.000  0 164549  0  non-token data: ntokens
0.000  0 1553643652  0  non-token data: oldest atime
0.000  0 1574862537  0  non-token data: newest atime
0.000  0 1574856320  0  non-token data: last journal sync atime
0.000  0 1574848041  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime delta
0.000  0  0  0  non-token data: last expire reduction 
count

I had SA running before, but hd to take a break because of upgrades. I have not 
had the chance yet to collect over 200 SPAM/HAM messages for training.

-- 
Anders Gustafsson
Engineer, CNI, CNE6, ASE
Pedago, The Aaland Islands (N60 E20)
www.pedago.fi
phone +358 18 12060
mobile +358 40506 7099







Re: sa-stats log analyzer (RE: Missed spam, suggestions?)

2016-03-13 Thread rob...@chalmers.com.au
The rulesemporium site appears to be down. 
If anyone has a newer version, it might be good to post it somewhere? My site 
for eg?

Robert


Sent from my iPad

> On 11 Mar 2016, at 04:17, David B Funk  wrote:
> 
> That's the output from Dallas Engelken's "sa-stats.pl" log analyzer.
> You feed it a segment of your spamd logs and it gives you
> those rule hit statistics.
> 
> See: http://wiki.apache.org/spamassassin/StatsAndAnalyzers
> 
> Looking at that wiki page, I noticed that the copy available is v0.93.
> I've got v1.03
> Does anybody know what was the newest one last avaialable on the 
> rulesemporium site? Anbody got something newer than v1.03?
> 
> I've done a bit of hacking to my copy (such as adding the S/O ratio stats).
> 
> 
>> On Thu, 10 Mar 2016, Erickarlo Porro wrote:
>> 
>> I would like to know how to get these stats too.
>>  
>> From: Robert Chalmers [mailto:rob...@chalmers.com.au]
>> Sent: Tuesday, March 08, 2016 5:25 AM
>> To: users@spamassassin.apache.org
>> Subject: Re: Missed spam, suggestions?
>>  
>> Can I ask, how are you getting these stats please?
>>  
>> Thanks
>> 
>>  On 8 Mar 2016, at 05:11, David B Funk  
>> wrote:
>>  
>> On Mon, 7 Mar 2016, Charles Sprickman wrote:
>> 
>>  I’ve been running with some daily training for a little over a week and 
>> I’m seeing less spam in my
>>  inbox.  I’ve seen a few things slip through because bayes tipped them 
>> below the default score, these
>>  were two phishing emails.
>> 
>>  Here’s some rule stats for anyone interested:
>> 
>>  TOP SPAM RULES FIRED
>> 
>>  RANK RULE NAMECOUNT %OFRULES %OFMAIL %OFSPAM  
>> %OFHAM
>> 
>>   1 TXREP   13171   8.47   40.38  91.00  
>> 72.91
>>   2 HTML_MESSAGE12714   8.18   38.98  87.85  
>> 90.80
>>   3 DCC_CHECK10593   6.81   32.48  73.19 
>>  33.78
>>   4 RDNS_NONE10269   6.60   31.48  70.95 
>>   5.63
>>   5 SPF_HELO_PASS 10070   6.48   30.87  69.58  
>> 23.41
>>   6 URIBL_BLACK97116.25   29.77  67.10   
>> 1.58
>>   7 BODY_NEWDOMAIN_FMBLA95506.14   29.28   
>> 65.98   1.64
>>   8 FROM_NEWDOMAIN_FMBLA94836.10   29.07   
>> 65.52   1.36
>>   9 BAYES_99 84865.46   26.02  
>> 58.63   1.18
>>  10BAYES_999   81415.24   24.96  
>> 56.25   1.06
>> 
>>  TOP HAM RULES FIRED
>> 
>>  RANK RULE NAMECOUNT %OFRULES %OFMAIL %OFSPAM  
>> %OFHAM
>> 
>>   1 HTML_MESSAGE16473   9.13   50.51  87.85  
>> 90.80
>>   2 DKIM_SIGNED13776   7.64   42.24  13.81  
>> 75.93
>>   3 TXREP   13228   7.33   40.56  91.00  
>> 72.91
>>   4 DKIM_VALID  12962   7.19   39.74  11.93  
>> 71.44
>>   5 RCVD_IN_DNSWL_NONE99415.51   30.48   8.08
>> 54.79
>>   6 DKIM_VALID_AU  87114.83   26.71   7.99   
>> 48.01
>>   7     BAYES_00 83904.65   25.72   
>> 1.84   46.24
>>   8 RCVD_IN_JMF_W   73694.09   22.59   2.54   
>> 40.62
>>   9 RCVD_IN_MSPIKE_WL 67133.72   20.58   
>> 4.3937.00
>>  10BAYES_50 62013.44   19.01  
>> 25.56  34.18
>> Based upon your stats it looks like you need more Bayes training. Your Bayes 
>> 00/99 hits should rank higher in the
>> rules-fired stats and BAYES_50 shouldn't be in the top-10 at all.
>> (of course if you've only been training for a week that would explain it).
>> For example, here's my top-10 hits (for a one month interval).
>> TOP SPAM RULES FIRED
>> --
>> RANKRULE NAME   COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
>> --
>>   1T__BOTNET_NOTRUST   114907   60.32   86.81   42.66  0.5755
>>   2BAYES_99109138   32.98   82.450.01  0.9998
&g

sa-stats log analyzer (RE: Missed spam, suggestions?)

2016-03-10 Thread David B Funk

That's the output from Dallas Engelken's "sa-stats.pl" log analyzer.
You feed it a segment of your spamd logs and it gives you
those rule hit statistics.

See: http://wiki.apache.org/spamassassin/StatsAndAnalyzers

Looking at that wiki page, I noticed that the copy available is v0.93.
I've got v1.03
Does anybody know what was the newest one last avaialable on the rulesemporium 
site? Anbody got something newer than v1.03?


I've done a bit of hacking to my copy (such as adding the S/O ratio stats).


On Thu, 10 Mar 2016, Erickarlo Porro wrote:



I would like to know how to get these stats too.

 

From: Robert Chalmers [mailto:rob...@chalmers.com.au]
Sent: Tuesday, March 08, 2016 5:25 AM
To: users@spamassassin.apache.org
Subject: Re: Missed spam, suggestions?

 

Can I ask, how are you getting these stats please?

 

Thanks

  On 8 Mar 2016, at 05:11, David B Funk  
wrote:

 

On Mon, 7 Mar 2016, Charles Sprickman wrote:


  I’ve been running with some daily training for a little over a week and 
I’m seeing less spam in my
  inbox.  I’ve seen a few things slip through because bayes tipped them 
below the default score, these
  were two phishing emails.

  Here’s some rule stats for anyone interested:

  TOP SPAM RULES FIRED

  RANK RULE NAME    COUNT %OFRULES %OFMAIL %OFSPAM  
%OFHAM

   1 TXREP   13171   8.47   40.38  91.00  72.91
   2 HTML_MESSAGE    12714   8.18   38.98  87.85  90.80
   3 DCC_CHECK    10593   6.81   32.48  73.19  
33.78
   4 RDNS_NONE    10269   6.60   31.48  70.95   
5.63
   5 SPF_HELO_PASS     10070   6.48   30.87  69.58  
23.41
   6 URIBL_BLACK    9711    6.25   29.77  67.10   
1.58
   7 BODY_NEWDOMAIN_FMBLA    9550    6.14   29.28   
65.98   1.64
   8 FROM_NEWDOMAIN_FMBLA    9483    6.10   29.07   
65.52   1.36
   9 BAYES_99     8486    5.46   26.02  
58.63   1.18
  10    BAYES_999   8141    5.24   24.96  56.25 
  1.06

  TOP HAM RULES FIRED

  RANK RULE NAME    COUNT %OFRULES %OFMAIL %OFSPAM  
%OFHAM

   1 HTML_MESSAGE    16473   9.13   50.51  87.85  90.80
   2 DKIM_SIGNED    13776   7.64   42.24  13.81  
75.93
   3 TXREP   13228   7.33   40.56  91.00  72.91
   4 DKIM_VALID  12962   7.19   39.74  11.93  
71.44
   5 RCVD_IN_DNSWL_NONE    9941    5.51   30.48   8.08  
  54.79
   6 DKIM_VALID_AU  8711    4.83   26.71   7.99   48.01
   7 BAYES_00     8390    4.65   25.72   
1.84   46.24
   8 RCVD_IN_JMF_W   7369    4.09   22.59   2.54   40.62
   9 RCVD_IN_MSPIKE_WL     6713    3.72   20.58   4.39  
  37.00
  10    BAYES_50     6201    3.44   19.01  
25.56  34.18


Based upon your stats it looks like you need more Bayes training. Your Bayes 
00/99 hits should rank higher in the
rules-fired stats and BAYES_50 shouldn't be in the top-10 at all.
(of course if you've only been training for a week that would explain it).

For example, here's my top-10 hits (for a one month interval).

TOP SPAM RULES FIRED
--
RANK    RULE NAME   COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
--
  1    T__BOTNET_NOTRUST   114907   60.32   86.81   42.66  0.5755
  2    BAYES_99    109138   32.98   82.45    0.01  0.9998
  3    BAYES_999   104903   31.70   79.25    0.01  0.
  4    HTML_MESSAGE    90850    79.41   68.63   86.59  0.3456
  5    URIBL_BLACK 90845    27.61   68.63    0.27  0.9942
  6    T_QUARANTINE_1  90640    27.40   68.47    0.02  0.9996
  7    URIBL_DBL_SPAM  79152    24.02   59.79    0.17  0.9956
  8    KAM_VERY_BLACK_DBL  74301    22.45   56.13    0.00  1.
  9    L_FROM_SPAMMER1k    73667    22.26   55.65    0.00  1.
 10    T__RECEIVED_1   72413    42.60   54.70   34.54  0.5135

OP HAM RULES FIRED
--
RANK    RULE NAME   COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
--
  1    BAYES_00    182674   56.03    2.11   91.97  0.0150
  2    HTML_MESSAGE    171992   79.41   68.63   86.59  0.3456
  3    SPF_PASS 

Re: Uptick in spam (bayes stats script)

2015-02-22 Thread Reindl Harald



Am 22.02.2015 um 15:30 schrieb @lbutlr:

On 21 Feb 2015, at 08:34 , LuKreme  wrote:

On Feb 18, 2015, at 6:20 AM, Reindl Harald  wrote:





That is a lot cleaner and more obvious, thank you for sharing


I ran this just after log rotation and got div by zero errors, so here is a 
(nearly) completely pointless ‘fix’:

BAYES_TOTAL=`echo 
"$BAYES_00+$BAYES_05+$BAYES_20+$BAYES_40+$BAYES_50+$BAYES_60+$BAYES_80+$BAYES_95+$BAYES_99"
 | bc`

+ if [ ! $BAYES_TOTAL ]; then
   BAYES_00_PCT=`echo "scale=2; ($BAYES_00*100)/$BAYES_TOTAL" | bc | sed 
's/^\./0./‘`

…

   echo -e "BAYES_999 `printf \"%*s\" 8 $BAYES_999` `printf \"%*s\" 7 
$BAYES_999_PCT` %”
+ fi

Yes, yes, I know, had I run the script a minute later, no error. But if I 
didn’t have OCD tendencies, would I even be on this list? :)


agreed - thanks - but the f don't work here, below a better one

- if [ ! $BAYES_TOTAL ]; then
+ if [ "$BAYES_TOTAL" -gt 0 ]; then



signature.asc
Description: OpenPGP digital signature


Re: Uptick in spam (bayes stats script)

2015-02-22 Thread @lbutlr
On 21 Feb 2015, at 08:34 , LuKreme  wrote:
> On Feb 18, 2015, at 6:20 AM, Reindl Harald  wrote:
>> 
>> 
> 
> That is a lot cleaner and more obvious, thank you for sharing

I ran this just after log rotation and got div by zero errors, so here is a 
(nearly) completely pointless ‘fix’:

BAYES_TOTAL=`echo 
"$BAYES_00+$BAYES_05+$BAYES_20+$BAYES_40+$BAYES_50+$BAYES_60+$BAYES_80+$BAYES_95+$BAYES_99"
 | bc`

+ if [ ! $BAYES_TOTAL ]; then
  BAYES_00_PCT=`echo "scale=2; ($BAYES_00*100)/$BAYES_TOTAL" | bc | sed 
's/^\./0./‘`

…

  echo -e "BAYES_999 `printf \"%*s\" 8 $BAYES_999` `printf \"%*s\" 7 
$BAYES_999_PCT` %”
+ fi

Yes, yes, I know, had I run the script a minute later, no error. But if I 
didn’t have OCD tendencies, would I even be on this list? :)

-- 
And she was drifting through the backyard
And she was taking off her dress
And she was moving very slowly
Rising up above the earth



Re: Uptick in spam (bayes stats script)

2015-02-21 Thread LuKreme
On Feb 18, 2015, at 6:20 AM, Reindl Harald  wrote:
> 
> 

That is a lot cleaner and more obvious, thank you for sharing


-- 
Once again I teeter at the precipice of the generation gap.



Re: Uptick in spam (bayes stats script)

2015-02-18 Thread Reindl Harald


Am 17.02.2015 um 15:23 schrieb Reindl Harald:

Am 17.02.2015 um 15:19 schrieb LuKreme:

On 16 Feb 2015, at 12:01 , Reindl Harald  wrote:

given that 24266 messages had BAYES_00 with a total number of 30401
delivered mails in the current month that training strategy seems to
work well

[root@mail-gw:~]$ bayes-stats.sh


What is bayes-stats.sh?


as simple shell script


nicer version attached as plain-text file
using now bash + bc + printf for % and formatting

removed the su-calls by place it in a worker-dir and call that with "su" 
from a script in PATH, well output looks now like below


bayes-stats.sh
0.000  0  3  0  non-token data: bayes db version
0.000  0  10606  0  non-token data: nspam
0.000  0  10688  0  non-token data: nham
0.000  01387376  0  non-token data: ntokens
0.000  0  993467899  0  non-token data: oldest atime
0.000  0 1424264407  0  non-token data: newest atime
0.000  0 1424264867  0  non-token data: last journal 
sync atime

0.000  0  0  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire 
atime delta
0.000  0  0  0  non-token data: last expire 
reduction count


insgesamt 35M
-rw--- 1 sa-milt sa-milt 2,6M 2015-02-18 14:07 bayes_seen
-rw--- 1 sa-milt sa-milt  40M 2015-02-18 14:07 bayes_toks
-rw--- 1 sa-milt sa-milt   98 2015-02-17 11:37 user_prefs

BAYES_00 28000   75.84 %
BAYES_05   4371.18 %
BAYES_20   5461.47 %
BAYES_40   5971.61 %
BAYES_50  4503   12.19 %
BAYES_60   4371.18 %
BAYES_80   3220.87 %
BAYES_95   2240.60 %
BAYES_99  18505.01 %
BAYES_999 16474.46 %

Delivered:34896
SpamAssassin: 3071
#!/usr/bin/bash

MAILLOG="/var/log/maillog"

/usr/bin/sa-learn --dump magic
echo ""

/usr/bin/ls -l -h --color=tty -X --group-directories-first 
--time-style=long-iso /var/lib/spamass-milter/.spamassassin/
echo ""

BAYES_00=`grep -c 'spamd: result:.*BAYES_00,' $MAILLOG`
BAYES_05=`grep -c 'spamd: result:.*BAYES_05,' $MAILLOG`
BAYES_20=`grep -c 'spamd: result:.*BAYES_20,' $MAILLOG`
BAYES_40=`grep -c 'spamd: result:.*BAYES_40,' $MAILLOG`
BAYES_50=`grep -c 'spamd: result:.*BAYES_50,' $MAILLOG`
BAYES_60=`grep -c 'spamd: result:.*BAYES_60,' $MAILLOG`
BAYES_80=`grep -c 'spamd: result:.*BAYES_80,' $MAILLOG`
BAYES_95=`grep -c 'spamd: result:.*BAYES_95,' $MAILLOG`
BAYES_99=`grep -c 'spamd: result:.*BAYES_99,' $MAILLOG`
BAYES_999=`grep -c 'spamd: result:.*BAYES_999,' $MAILLOG`

BAYES_TOTAL=`echo 
"$BAYES_00+$BAYES_05+$BAYES_20+$BAYES_40+$BAYES_50+$BAYES_60+$BAYES_80+$BAYES_95+$BAYES_99"
 | bc`

BAYES_00_PCT=`echo "scale=2; ($BAYES_00*100)/$BAYES_TOTAL" | bc | sed 
's/^\./0./'`
BAYES_05_PCT=`echo "scale=2; ($BAYES_05*100)/$BAYES_TOTAL" | bc | sed 
's/^\./0./'`
BAYES_20_PCT=`echo "scale=2; ($BAYES_20*100)/$BAYES_TOTAL" | bc | sed 
's/^\./0./'`
BAYES_40_PCT=`echo "scale=2; ($BAYES_40*100)/$BAYES_TOTAL" | bc | sed 
's/^\./0./'`
BAYES_50_PCT=`echo "scale=2; ($BAYES_50*100)/$BAYES_TOTAL" | bc | sed 
's/^\./0./'`
BAYES_60_PCT=`echo "scale=2; ($BAYES_60*100)/$BAYES_TOTAL" | bc | sed 
's/^\./0./'`
BAYES_80_PCT=`echo "scale=2; ($BAYES_80*100)/$BAYES_TOTAL" | bc | sed 
's/^\./0./'`
BAYES_95_PCT=`echo "scale=2; ($BAYES_95*100)/$BAYES_TOTAL" | bc | sed 
's/^\./0./'`
BAYES_99_PCT=`echo "scale=2; ($BAYES_99*100)/$BAYES_TOTAL" | bc | sed 
's/^\./0./'`
BAYES_999_PCT=`echo "scale=2; ($BAYES_999*100)/$BAYES_TOTAL" | bc | sed 
's/^\./0./'`

echo -e "BAYES_00  `printf \"%*s\" 8 $BAYES_00` `printf \"%*s\" 7 
$BAYES_00_PCT` %"
echo -e "BAYES_05  `printf \"%*s\" 8 $BAYES_05` `printf \"%*s\" 7 
$BAYES_05_PCT` %"
echo -e "BAYES_20  `printf \"%*s\" 8 $BAYES_20` `printf \"%*s\" 7 
$BAYES_20_PCT` %"
echo -e "BAYES_40  `printf \"%*s\" 8 $BAYES_40` `printf \"%*s\" 7 
$BAYES_40_PCT` %"
echo -e "BAYES_50  `printf \"%*s\" 8 $BAYES_50` `printf \"%*s\" 7 
$BAYES_50_PCT` %"
echo -e "BAYES_60  `printf \"%*s\" 8 $BAYES_60` `printf \"%*s\" 7 
$BAYES_60_PCT` %"
echo -e "BAYES_80  `printf \"%*s\" 8 $BAYES_80` `printf \"%*s\" 7 
$BAYES_80_PCT` %"
echo -e "BAYES_95  `printf \"%*s\" 8 $BAYES_95` `printf \"%*s\" 7 
$BAYES_95_PCT` %"
echo -e "BAYES_99  `printf \"%*s\" 8 $BAYES_99` `printf \"%*s\" 7 
$BAYES_99_PCT` %"
echo -e "BAYES_999 `printf \"%*s\" 8 $BAYES_999` `printf \"%*s\" 7 
$BAYES_999_PCT` %"
echo ""

echo "Delivered:`grep -c 'relay=.*status=sent' $MAILLOG`"
echo "SpamAssassin: `grep -c 'Blocked by SpamAssassin' $MAILLOG`"


signature.asc
Description: OpenPGP digital signature


Re: EmailBL stats

2009-05-28 Thread Michael Monnerie
On Samstag 23 Mai 2009 Chris wrote:
> EmailB

Of 71 messages where EMAILBL hit, 3 were still marked ham but really 
spam (points: 2.0, 3.0, 3.1), no FPs. One message was just pushed over 
5.0 by EMAILBL and would have been a FN otherwise.

So it helps here. We have a very hard setup and only few spam, and at 
least once it prevented a spam passing thru, while having no FPs.
So its recommendable :-)

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660 / 415 65 31  .network.your.ideas.
// PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net  Key-ID: 1C1209B4



signature.asc
Description: This is a digitally signed message part.


Re: Stats (was: The EmailBL test zone period has been extended to July 1st.)

2009-05-26 Thread Justin Mason
btw guys, note that hit-frequencies can also produce rule-overlap reports using
the "-o" switch

--j.

On Tue, May 26, 2009 at 00:57, Mandy  wrote:
> On Fri, May 22, 2009 at 9:06 PM, Henrik K  wrote:
>> On Fri, May 22, 2009 at 09:28:55PM +0200, Karsten Bräckelmann wrote:
>>> > The EmailBL test zone period has been extended to July 1st.
>
> [snip]
>
>> Thanks. And this is just a small scale test. If we used more domains, feeds,
>> and submissions, it could be even nicer. ;-) Keep the reports coming in. It
>> would be nice to also know how much of spam are generally from freemails, so
>> FREEMAIL_FROM/BODY/REPLYTO figures would be nice also when reporting. It
>> might differ from user to user.
>
> I just spent some time putting together some stats.  I'm going to try
> to follow the excellent lead of Karsten, and provide some overlap
> figures based on the cool grep formula that Dan Mcdonald showed.  The
> short version is that it hits about 12% of spam scoring under 15.
>
> The time period is somewhat short: May 22 to May 25.  It's a little
> inaccurate too, due to 12 hours of extra mail in the May 22 side
> because I implemented at noon, but...
>
> As I mentioned before, this is from a mid-sized install of Canadian
> government & education users (somewhere around 100 000 mailboxes).  SA
> only sees a filtered mail-stream in my setup -- to give an idea how
> filtered, 75% of the mail that SA sees is classified as ham.  The
> totals volumes were 192 530 Spam, 564 483 Ham.
>
>
> 24.5% of the spam that's tagged is between 5 & 10 score.
> 2.76% of that mail hit EMAILBL_TEST_LEM.
> 0.95% hit FREEMAIL_REPLYTO
>
> 22.9% of the spam that's tagged is between 10 and 15.
> 8.97% of that mail hit EMAILBL_TEST_LEM.
> 1.20% hit FREEMAIL_REPLYTO
>
> 52.5% of the spam that's tagged is above 15.
> 21.41% of that mail hit EMAILBL_TEST_LEM.
> 2.36% hit FREEMAIL_REPLYTO
>
> I also saw 0.05% hits of EMAILBL_TEST_LEM on mail classified as ham.
> I hand-verified the 35 messages of 299 that weren't obvious spam.
> About 9 of those were FPs (and those came down to 3 distinct messages
> from lists I sure wouldn't choose to be on).  I can provide them
> off-list if desired.
>
> I saw even fewer FREEMAIL_REPLYTO hits on mail classified as ham.  56,
> or 0.01%.  About 22 of those (based on subject line -- sorry it's the
> end of the day) look legit.
>
> Here are the overlap numbers for mail with score less than 10:
> $ grep EMAILBL_TEST_LEM spamd_since_22nd | perl -ne 'if (/spamd:
> result: Y (\d+)/) { print if $1 <= 10 }' | cut -d' ' -f11 | egrep -o
> '[A-Z0-9_:\.]+?,' | sort | uniq -c | sort -rn | head -n15
>   1304 EMAILBL_TEST_LEM,
>    728 RAZOR2_CHECK,
>    643 RAZOR2_CF_RANGE_51_100,
>    629 RAZOR2_CF_RANGE_E4_51_100,
>    612 BAYES_50,
>    590 FORGED_YAHOO_RCVD,
>    582 BAYES_99,
>    282 HTML_MESSAGE,
>    199 FREEMAIL_FROM,
>    157 ADVANCE_FEE_2,
>    132 FORGED_MUA_OUTLOOK,
>    114 FREEMAIL_REPLYTO,
>    103 RCVD_IN_BRBL,
>     72 SPF_PASS,
>
> And here they are for all hits on EMAILBL_TEST_LEM:
> $ grep EMAILBL_TEST_LEM spamd_since_22nd | cut -d' ' -f11 | egrep -o
> '[A-Z0-9_:\.]+?,' | sort | uniq -c | sort -rn | head -n15
>  41503 EMAILBL_TEST_LEM,
>  38987 BAYES_99,
>  36782 FORGED_MUA_OUTLOOK,
>  36028 ADVANCE_FEE_2,
>  33746 RCVD_IN_BRBL,
>  33506 JM_SOUGHT_FRAUD_3,
>  33214 JM_SOUGHT_FRAUD_2,
>  33186 HTML_MESSAGE,
>  32281 RCVD_IN_BL_SPAMCOP_NET,
>  31953 JM_SOUGHT_FRAUD_1,
>  31914 RDNS_NONE,
>  31893 RCVD_IN_SBL,
>  31883 MIME_HTML_ONLY,
>
> Phew.  Hopefully those numbers are useful.
>
>


Re: Stats (was: The EmailBL test zone period has been extended to July 1st.)

2009-05-25 Thread Mandy
On Fri, May 22, 2009 at 9:06 PM, Henrik K  wrote:
> On Fri, May 22, 2009 at 09:28:55PM +0200, Karsten Bräckelmann wrote:
>> > The EmailBL test zone period has been extended to July 1st.

[snip]

> Thanks. And this is just a small scale test. If we used more domains, feeds,
> and submissions, it could be even nicer. ;-) Keep the reports coming in. It
> would be nice to also know how much of spam are generally from freemails, so
> FREEMAIL_FROM/BODY/REPLYTO figures would be nice also when reporting. It
> might differ from user to user.

I just spent some time putting together some stats.  I'm going to try
to follow the excellent lead of Karsten, and provide some overlap
figures based on the cool grep formula that Dan Mcdonald showed.  The
short version is that it hits about 12% of spam scoring under 15.

The time period is somewhat short: May 22 to May 25.  It's a little
inaccurate too, due to 12 hours of extra mail in the May 22 side
because I implemented at noon, but...

As I mentioned before, this is from a mid-sized install of Canadian
government & education users (somewhere around 100 000 mailboxes).  SA
only sees a filtered mail-stream in my setup -- to give an idea how
filtered, 75% of the mail that SA sees is classified as ham.  The
totals volumes were 192 530 Spam, 564 483 Ham.


24.5% of the spam that's tagged is between 5 & 10 score.
2.76% of that mail hit EMAILBL_TEST_LEM.
0.95% hit FREEMAIL_REPLYTO

22.9% of the spam that's tagged is between 10 and 15.
8.97% of that mail hit EMAILBL_TEST_LEM.
1.20% hit FREEMAIL_REPLYTO

52.5% of the spam that's tagged is above 15.
21.41% of that mail hit EMAILBL_TEST_LEM.
2.36% hit FREEMAIL_REPLYTO

I also saw 0.05% hits of EMAILBL_TEST_LEM on mail classified as ham.
I hand-verified the 35 messages of 299 that weren't obvious spam.
About 9 of those were FPs (and those came down to 3 distinct messages
from lists I sure wouldn't choose to be on).  I can provide them
off-list if desired.

I saw even fewer FREEMAIL_REPLYTO hits on mail classified as ham.  56,
or 0.01%.  About 22 of those (based on subject line -- sorry it's the
end of the day) look legit.

Here are the overlap numbers for mail with score less than 10:
$ grep EMAILBL_TEST_LEM spamd_since_22nd | perl -ne 'if (/spamd:
result: Y (\d+)/) { print if $1 <= 10 }' | cut -d' ' -f11 | egrep -o
'[A-Z0-9_:\.]+?,' | sort | uniq -c | sort -rn | head -n15
   1304 EMAILBL_TEST_LEM,
728 RAZOR2_CHECK,
643 RAZOR2_CF_RANGE_51_100,
629 RAZOR2_CF_RANGE_E4_51_100,
612 BAYES_50,
590 FORGED_YAHOO_RCVD,
582 BAYES_99,
282 HTML_MESSAGE,
199 FREEMAIL_FROM,
157 ADVANCE_FEE_2,
132 FORGED_MUA_OUTLOOK,
114 FREEMAIL_REPLYTO,
103 RCVD_IN_BRBL,
 72 SPF_PASS,

And here they are for all hits on EMAILBL_TEST_LEM:
$ grep EMAILBL_TEST_LEM spamd_since_22nd | cut -d' ' -f11 | egrep -o
'[A-Z0-9_:\.]+?,' | sort | uniq -c | sort -rn | head -n15
  41503 EMAILBL_TEST_LEM,
  38987 BAYES_99,
  36782 FORGED_MUA_OUTLOOK,
  36028 ADVANCE_FEE_2,
  33746 RCVD_IN_BRBL,
  33506 JM_SOUGHT_FRAUD_3,
  33214 JM_SOUGHT_FRAUD_2,
  33186 HTML_MESSAGE,
  32281 RCVD_IN_BL_SPAMCOP_NET,
  31953 JM_SOUGHT_FRAUD_1,
  31914 RDNS_NONE,
  31893 RCVD_IN_SBL,
  31883 MIME_HTML_ONLY,

Phew.  Hopefully those numbers are useful.


Re: Stats (was: The EmailBL test zone period has been extended to July 1st.)

2009-05-23 Thread Karsten Bräckelmann
Sorry, quoting self.

> > > An interesting observation is, that the hitrate (in percent) in spam
> > > scoring < 15 is an order of magnitude higher than with high-scoring [1]
> > > spam. This is rare to find...

> That's limited to EmailBL hits, so the total of these hits equal 100%.
> For me that would have been:
> 
>   19.4%  of mail hitting EmailBL has a score < 15
>   80.6%  of mail hitting EmailBL has a score > 15

Oh, and EmailBL hits only a mere 1.02% of *all* my spam anyway. That's
poor, isn't it?

No, it is not! :)  Because it identifies a whopping 10.9% of the spam,
that isn't already branded by all those existing tests.


> And that's what counts in my book. I don't care if the lions share of
> EmailBL hits are actually high scorers. Those don't need a boost anyway.
> What I do care about are hits in the sneaky-ish crap. And that's where
> it hits on more than 10%.

I don't write rules to target high-scoring spam either. I look at sneaky
spam, and write rules to identify those. If it also hits on a lot of
high scorers, fine. But I couldn't care less, if it does -- as long as
it identifies those going under the radar of a score 15 cutoff.


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Stats (was: The EmailBL test zone period has been extended to July 1st.)

2009-05-23 Thread Karsten Bräckelmann
On Sat, 2009-05-23 at 11:26 -0500, Larry Nedry wrote:
> On 5/22/09 at 9:28 PM +0200 Karsten Bräckelmann wrote:
> >An interesting observation is, that the hitrate (in percent) in spam
> >scoring < 15 is an order of magnitude higher than with high-scoring [1]
> >spam. This is rare to find...
> 
> My EMAILBL_TEST_LEM hitrate leans heavily toward the other end of the
> spectrum with almost 88% scoring > 15.  My data is based on a little more
> than 100,000 emails.

Wait, you're looking at the hits differently than I did.

> Stats for only messages tagged with EMAILBL_TEST_LEM:
> 
> 04.5% scored 00.0 - 05.0
> 03.0% scored 05.0 - 10.0
> 04.5% scored 10.0 - 15.0
> 09.1% scored 15.0 - 20.0
> 78.8% scored 20.0 or higher

That's limited to EmailBL hits, so the total of these hits equal 100%.
For me that would have been:

  19.4%  of mail hitting EmailBL has a score < 15
  80.6%  of mail hitting EmailBL has a score > 15

However, a score > 15 is more than 98.5% of my spam. Taking that into
account, the numbers change drastically. That's what I reported. Less
than 1% hits in ALL spam with a total score of 15 or higher.

Yet, 10.9% hits in ALL spam with a score less than 15.

And that's what counts in my book. I don't care if the lions share of
EmailBL hits are actually high scorers. Those don't need a boost anyway.
What I do care about are hits in the sneaky-ish crap. And that's where
it hits on more than 10%.


Larry, what numbers do you get, if you count hits in ALL your spam
in-stream, broken down by scores?

  guenther


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Stats (was: The EmailBL test zone period has been extended to July 1st.)

2009-05-23 Thread Larry Nedry
On 5/22/09 at 9:28 PM +0200 Karsten Bräckelmann wrote:
>An interesting observation is, that the hitrate (in percent) in spam
>scoring < 15 is an order of magnitude higher than with high-scoring [1]
>spam. This is rare to find...

My EMAILBL_TEST_LEM hitrate leans heavily toward the other end of the
spectrum with almost 88% scoring > 15.  My data is based on a little more
than 100,000 emails.

EMAILBL_TEST_LEM stats for all messages passed through Spamassassin:
- hit 2.00% of all email tagged as spam.
- hit 0.04% of all email tagged as ham.

There were no false positives and the messages tagged as ham were false
negatives.  If I had given EMAILBL_TEST_LEM a score of 2.0 instead of its
current 0.001 all but one of the FNs would have been properly tagged as
spam.

Stats for only messages tagged with EMAILBL_TEST_LEM:

04.5% scored 00.0 - 05.0
03.0% scored 05.0 - 10.0
04.5% scored 10.0 - 15.0
09.1% scored 15.0 - 20.0
78.8% scored 20.0 or higher

Nedry


Re: Stats (was: The EmailBL test zone period has been extended to July 1st.)

2009-05-23 Thread Chris
On Sat, 2009-05-23 at 07:06 +0300, Henrik K wrote:

> Thanks. And this is just a small scale test. If we used more domains, feeds,
> and submissions, it could be even nicer. ;-) Keep the reports coming in. It
> would be nice to also know how much of spam are generally from freemails, so
> FREEMAIL_FROM/BODY/REPLYTO figures would be nice also when reporting. It
> might differ from user to user.
> 
Freemail stats from 3 May through yesterday:

Rule Name Score Ham   Spam   %of Ham   %of Spam

---
  FREEMAIL_REPLYTO2.00  1 46 0.28% 21.20%
  FREEMAIL_FROM   0.50  7 87 1.97% 40.09%

---
  OVERALL   7 90 1.97% 41.47%

-- 
KeyID 0xE372A7DA98E6705C



signature.asc
Description: This is a digitally signed message part


Re: EmailBL stats

2009-05-23 Thread Chris
On Sat, 2009-05-23 at 16:43 +0200, Karsten Bräckelmann wrote:

> > 
> > Those are not the total spam for the day but the cumulative spam from
> > one day to the next. Though the percentile if figured on the total
> 
> Ah, yees. :)  Thanks. I was missing the base before you enabled EmailBL.
> So that draws another picture than the per-month percentage above:
> 
>   11 hits / 95 spam ==  11.6%
> 
> Out of curiosity, do you run any SMTP time checks or blacklists,
> rejecting mail before SA gets to see them? Given those numbers, I assume
> the answer is yes, and these stats don't include the bulk of spam or
> spam connection attempts.

This is a single user home system. Mail is fetched by fetchmail from all
accounts except yahoo and that uses fetchyahoo. All mail, including
yahoo, is then piped through procmail. What doesn't get tossed to the
various mailing lists I belong to gets run through SA. May 14th was the
first day that stats for EmailBL was noted and the total messages up
till then from May 3rd (when I updated the OS) were Total: 318 Ham:
213 Spam:  105, on yesterday stats there were Total: 572 Ham:   355
Spam:  217 so there were a total of 254 mails during the start date and
last night when stats were run. Last nights stats are:

---
  EMAILBL_TEST_LEM  0.50  0 16 0.00%  7.37%

Better explanation?

Chris
-- 
KeyID 0xE372A7DA98E6705C



signature.asc
Description: This is a digitally signed message part


Re: EmailBL stats

2009-05-23 Thread Karsten Bräckelmann
On Fri, 2009-05-22 at 21:53 -0500, Chris wrote:
> On Sat, 2009-05-23 at 04:11 +0200, Karsten Bräckelmann wrote:

> > Sorry, no. :)  The dates and numbers don't match, unless you didn't get
> > any spam early this month.

> Is this what you're looking for:
> Starting point as of 13 May with plug-in - Spam:  97

> 21 May Spam: 10 Total 192
> ---
>   EMAILBL_TEST_LEM0.50  0 11 0.00%  5.73%
> 
> Those are not the total spam for the day but the cumulative spam from
> one day to the next. Though the percentile if figured on the total

Ah, yees. :)  Thanks. I was missing the base before you enabled EmailBL.
So that draws another picture than the per-month percentage above:

  11 hits / 95 spam ==  11.6%

Out of curiosity, do you run any SMTP time checks or blacklists,
rejecting mail before SA gets to see them? Given those numbers, I assume
the answer is yes, and these stats don't include the bulk of spam or
spam connection attempts.


> I must have deleted your earlier post so I can't refer back to it. If
> this is not what your refering to I guess I'm not understanding you
> correctly.

That would be this [1] post, though in a nutshell I was asking for finer
grained numbers, split up by spam score <10, 10-15, 15+. Reason for that
is, that I found EmailBL to specifically hit best in the low-scoring
range, which is rare to find, yet exactly what we need. :)

FWIW, I'm not involved in EmailBL, just curious to verify my
observation.

  guenther


[1] http://markmail.org/message/qku7o5xqnhiofnh2

-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



RE: Stats (was: The EmailBL test zone period has been extended toJuly 1st.)

2009-05-23 Thread McDonald, Dan
-Original Message-
From: Henrik K [mailto:h...@hege.li]
Sent: Fri 22-May-09 23:06
To: users@spamassassin.apache.org
Subject: Re: Stats (was: The EmailBL test zone period has been extended toJuly 
1st.)
 
>On Fri, May 22, 2009 at 09:28:55PM +0200, Karsten Bräckelmann wrote:
>> > The EmailBL test zone period has been extended to July 1st.
>> 
>> 
>> Looks really good to me, guys! Great job.  Can I keep it? :)
>
>Thanks. And this is just a small scale test. If we used more domains, feeds,
>and submissions, it could be even nicer. ;-) Keep the reports coming in. It
>would be nice to also know how much of spam are generally from freemails, so
>FREEMAIL_FROM/BODY/REPLYTO figures would be nice also when reporting. It
>might differ from user to user.

I show very little overlap with FREEMAIL:
$ grep EMAILBL_TEST_LEM=  /var/log/mail/info | grep -P -o 'tests=.+?\]' | grep 
-o -P '[\w_:\.]+?=' | sort | uniq -c | sort -rn | grep -E 'EMAILBL|FREEMAIL'
231 EMAILBL_TEST_LEM=
186 EMAILBL_TEST_LEM_REPLYTO=
139 EMAILBL_TEST_LEM_BODY=
119 EMAILBL_TEST_LEM_FROM=
  8 FREEMAIL_FROM=
  6 FREEMAIL_REPLYTO=

The same overlap statistics for the same timeframe on Freemail gives:
$ grep -A 10 'May 22 07:16:31' /var/log/mail/info | grep FREEMAIL | 
grep -P -o 'tests=.+?\]' | grep -o -P '[\w_:\.]+?=' | sort | uniq -c | sort -rn 
 52 tests=
 47 FREEMAIL_FROM=
 25 RELAY_US=
 25 FORGED_MUA_OUTLOOK=
 23 L_P0F_W=
 22 HTML_MESSAGE=
 21 SPF_SOFTFAIL=
 21 SPF_PASS=
 19 MSOE_MID_WRONG_CASE=
 16 L_P0F_Linux=
 15 RAZOR2_CHECK=
 13 RAZOR2_CF_RANGE_E4_51_100=
 13 RAZOR2_CF_RANGE_51_100=
 13 FREEMAIL_REPLYTO=
 12 ADVANCE_FEE_2=
 11 RCVD_IN_BL_SPAMCOP_NET=
 11 EMAILBL_TEST_LEM_FROM=
 10 SUBJ_ALL_CAPS=
 10 RCVD_IN_BRBL_RELAY=
  9 RELAY_NG=
  9 RCVD_IN_SORBS_WEB=
  9 L_P0F_Unix=
  9 EMAILBL_TEST_LEM=
  8 US_DOLLARS_3=
  8 SARE_FRAUD_X3=
  8 RCVD_IN_INVLSIP_RELAY=
  8 EMAILBL_TEST_LEM_REPLYTO=
  7 JM_SOUGHT_FRAUD_3=
  7 BOTNET_OTHER=
  6 SARE_FRAUD_X4=
  6 RDNS_NONE=
  6 EMAILBL_TEST_LEM_BODY=
  6 BOTNET_SOHO=
  5 URG_BIZ=
  5 RCVD_IN_SBL=
  5 MILLION_USD=
  5 JM_SOUGHT_FRAUD_2=
  5 JM_SOUGHT_FRAUD_1=
  5 ADVANCE_FEE_3=
  4 RELAY_CN=
  4 L_P0F_UNKN=
  4 ADVANCE_FEE_4=
  3 SARE_FRAUD_X5=
  3 RAZOR2_CF_RANGE_E4_100=
  3 DATE_IN_PAST_96_XX=
  2 XMAILER_MIMEOLE_OL_1ECD5=
  2 SPF_NEUTRAL=
  2 RELAY_KR=
  2 MIME_HTML_ONLY=
  2 L_UNVERIFIED_GMAIL=
  2 KAM_LOTTO2=
  2 KAM_LOTTO1=
  2 HTML_IMAGE_RATIO_02=
  2 HTML_FONT_SIZE_LARGE=
  2 FORGED_OUTLOOK_HTML=
  2 DEAR_WINNER=
  2 DEAR_FRIEND=
  2 BOTNET_W=
  2 AV:Sanesecurity.SpamL.10208.UNOFFICIAL=
  2 AV:Sanesecurity.ScamL.718.UNOFFICIAL=
  2 AV:Sanesecurity.Junk.7798.UNOFFICIAL=
  2 AV:Sanesecurity.Junk.10603.UNOFFICIAL=
  1 SPF_HELO_PASS=
  1 SARE_LOTTO_SPAM=
  1 RELAY_TW=
  1 RELAY_RU=
  1 RELAY_BR=
  1 RCVD_DOUBLE_IP_LOOSE=
  1 MSGID_FROM_MTA_HEADER=
  1 MISSING_SUBJECT=
  1 MISSING_MIMEOLE=
  1 MIME_QP_LONG_LINE=
  1 MILLION_EURO=
  1 LOTTERY_PH_004470=
  1 JM_SOUGHT_2=
  1 HTML_IMAGE_RATIO_04=
  1 HTML_IMAGE_ONLY_20=
  1 HTML_IMAGE_ONLY_16=
  1 HTML_IMAGE_ONLY_08=
  1 HTML_FONT_SIZE_HUGE=
  1 FROM_LOCAL_NOVOWEL=
  1 FORGED_OUTLOOK_TAGS=
  1 DATE_IN_PAST_12_24=
  1 DATE_IN_FUTURE_96_XX=
  1 DATE_IN_FUTURE_06_12=
  1 DATE_IN_FUTURE_03_06=
  1 AV:Sanesecurity.Scam4.1717.UNOFFICIAL=
  1 AV:Sanesecurity.Scam4.1381.UNOFFICIAL=
  1 AV:Sanesecurity.Junk.15675.UNOFFICIAL=



<>

Re: Stats (was: The EmailBL test zone period has been extended to July 1st.)

2009-05-22 Thread Henrik K
On Fri, May 22, 2009 at 09:28:55PM +0200, Karsten Bräckelmann wrote:
> > The EmailBL test zone period has been extended to July 1st.
> 
> As promised, here are some results from me, now that I got some half-
> decent spam throughput. Not an ISP, not a company. Have been running the
> original cf for 5 days, then updated. Since then another 5 days passed.
> 
>8.7%  hits in spam scoring 05-10
>   11.5%  hits in spam scoring 10-15
> 
>   Hit 1 out of 3 FNs (spam, score < 5).  No hits in ham.
> 
> Overall hitrate. Numbers are even better just looking at the last 5 days
> worth, after the update. Less than 1% hits in the spam scoring >= 15,
> but that's entirely perfect. :)
> 
> About half of the hits in spam < 15 also are caught by SOUGHT_FRAUD.
> That overlap was to be expected.
> 
> An interesting observation is, that the hitrate (in percent) in spam
> scoring < 15 is an order of magnitude higher than with high-scoring [1]
> spam. This is rare to find...
> 
> 
> Looks really good to me, guys! Great job.  Can I keep it? :)

Thanks. And this is just a small scale test. If we used more domains, feeds,
and submissions, it could be even nicer. ;-) Keep the reports coming in. It
would be nice to also know how much of spam are generally from freemails, so
FREEMAIL_FROM/BODY/REPLYTO figures would be nice also when reporting. It
might differ from user to user.



Re: EmailBL stats

2009-05-22 Thread Chris
On Sat, 2009-05-23 at 04:11 +0200, Karsten Bräckelmann wrote:
> What about some grep love, and splitting that up in at least less and
> greater than a total of score 15? See my post about 6 hours ago, and
> considerably more hits in the low-ish scoring spam.
> 
> 
> > Spam:  192
> > (thats a total count since 3 May)
> > 
> > Totals since last Thursday 14 May
> 
> >   Rule NameScore Ham   Spam   %of Ham   %of Spam
> > ---
> >   EMAILBL_TEST_LEM  0.50  0 11 0.00%  5.73%
> 
> Sorry, no. :)  The dates and numbers don't match, unless you didn't get
> any spam early this month.
> 
>   192 * 0.0573 == 11.002
> 
> 
Is this what you're looking for:
Starting point as of 13 May with plug-in - Spam:  97
14 May Spam:  8 Total 105 - 
---
  EMAILBL_TEST_LEM0.50  0  1 0.00%  0.95%
  
15 May Spam:  8 Total 113
---
  EMAILBL_TEST_LEM0.50  0  3 0.00%  2.65%

16 May Spam:  7 Total 120
---
  EMAILBL_TEST_LEM0.50  0  3 0.00%  2.50%

17 May Spam:  3 Total 123
---
  EMAILBL_TEST_LEM0.50  0  3 0.00%  2.44%

18 May Spam: 12 Total 135
---
  EMAILBL_TEST_LEM0.50  0  5 0.00%  3.70%

19 May Spam: 28 Total 163
---
  EMAILBL_TEST_LEM0.50  0 10 0.00%  6.13%

20 May Spam: 19 Total 182
---
  EMAILBL_TEST_LEM0.50  0 10 0.00%  5.49%

21 May Spam: 10 Total 192
---
  EMAILBL_TEST_LEM0.50  0 11 0.00%  5.73%

Those are not the total spam for the day but the cumulative spam from
one day to the next. Though the percentile if figured on the total
number of spam it's a total of 11 hits of 95 spam or 8.63% of spam for
the time period I've been running it. I must have deleted your earlier
post so I can't refer back to it. If this is not what your refering to I
guess I'm not understanding you correctly.

Chris

-- 
KeyID 0xE372A7DA98E6705C



signature.asc
Description: This is a digitally signed message part


Re: EmailBL stats

2009-05-22 Thread Karsten Bräckelmann
What about some grep love, and splitting that up in at least less and
greater than a total of score 15? See my post about 6 hours ago, and
considerably more hits in the low-ish scoring spam.


> Spam:  192
> (thats a total count since 3 May)
> 
> Totals since last Thursday 14 May

>   Rule NameScore Ham   Spam   %of Ham   %of Spam
> ---
>   EMAILBL_TEST_LEM  0.50  0 11 0.00%  5.73%

Sorry, no. :)  The dates and numbers don't match, unless you didn't get
any spam early this month.

  192 * 0.0573 == 11.002


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



EmailBL stats

2009-05-22 Thread Chris
Ham:   329
Spam:  192
(thats a total count since 3 May)

Totals since last Thursday 14 May

EmailBL.cf:
  Rule NameScore Ham   Spam   %of Ham   %of Spam

---
  EMAILBL_TEST_LEM  0.50  0 11 0.00%  5.73%

---
  OVERALL 0 11 0.00%  5.73%


-- 
KeyID 0xE372A7DA98E6705C



signature.asc
Description: This is a digitally signed message part


Stats (was: The EmailBL test zone period has been extended to July 1st.)

2009-05-22 Thread Karsten Bräckelmann
> The EmailBL test zone period has been extended to July 1st.

As promised, here are some results from me, now that I got some half-
decent spam throughput. Not an ISP, not a company. Have been running the
original cf for 5 days, then updated. Since then another 5 days passed.

   8.7%  hits in spam scoring 05-10
  11.5%  hits in spam scoring 10-15

  Hit 1 out of 3 FNs (spam, score < 5).  No hits in ham.

Overall hitrate. Numbers are even better just looking at the last 5 days
worth, after the update. Less than 1% hits in the spam scoring >= 15,
but that's entirely perfect. :)

About half of the hits in spam < 15 also are caught by SOUGHT_FRAUD.
That overlap was to be expected.

An interesting observation is, that the hitrate (in percent) in spam
scoring < 15 is an order of magnitude higher than with high-scoring [1]
spam. This is rare to find...


Looks really good to me, guys! Great job.  Can I keep it? :)

  guenther


[1] Which accounts for the bulk of my spam with > 98.5%. The goal is to
find ways to score the remaining 1.5%.

-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



EmailBL Stats

2009-05-20 Thread Chris
Ham:   294
Spam:  163

EmailBL.cf:
  Rule Name  Score Ham   Spam   %of Ham   %of Spam

---
  EMAILBL_TEST_LEM   0.50  0 10 0.00%  6.13%

---
  OVERALL  0 10 0.00%  6.13%

As of yesterday at midnight.

-- 
KeyID 0xE372A7DA98E6705C



signature.asc
Description: This is a digitally signed message part


Re: EmailBl Stats

2009-05-18 Thread Jason Haar
Well since we're all doing show-and-tell, so far in the past 24 hours
2310 email have triggered the EMAILBL* rules, of which (with the default
0.5 score) 70 were FN

i.e. if I increased the score to 2, all those 70 would have been marked
as spam (and I checked: they were spam)

-- 
Cheers

Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +64 3 9635 377 Fax: +64 3 9635 417
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1



Re: EmailBl Stats

2009-05-18 Thread DAve

Karsten Bräckelmann wrote:

On Mon, 2009-05-18 at 10:50 -0400, DAve wrote:
I will see about the update, for now the last five days stats are as 
follows.


Total mail through SA = 208,498
Total spam messages tagged with EMAILBL = 1471
Total non spam messages tagged with EMAILBL = 128


What exactly are these?


FP seen = 0


By FP, do you mean a mail being flagged as spam (total result)? As
opposed to per-rule FPs.


Sorry, I could have been more clear.

Total spam messages tagged with EMAILBL = Messages achieving a spam 
score, that were spam, and tagged by EMAILBL.


Total non spam messages tagged with EMAILBL = Messages not achieving a 
spam score, that were spam, and tagged by EMAILBL.


FP seen = Messages not achieving a spam score, that were not spam, and 
tagged by EMAILBL.


DAve


--
"Posterity, you will know how much it cost the present generation to
preserve your freedom.  I hope you will make good use of it.  If you
do not, I shall repent in heaven that ever I took half the pains to
preserve it." John Quincy Adams

http://appleseedinfo.org



Re: EmailBl Stats

2009-05-18 Thread Karsten Bräckelmann
On Mon, 2009-05-18 at 10:50 -0400, DAve wrote:
> I will see about the update, for now the last five days stats are as 
> follows.
> 
> Total mail through SA = 208,498
> Total spam messages tagged with EMAILBL = 1471
> Total non spam messages tagged with EMAILBL = 128

What exactly are these?

> FP seen = 0

By FP, do you mean a mail being flagged as spam (total result)? As
opposed to per-rule FPs.


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: EmailBl Stats

2009-05-18 Thread Steve Freegard
Henrik K wrote:
> On Sat, May 16, 2009 at 08:25:58AM -0500, Chris wrote:
>> Started running the plug-in Thursday and though I don't get much spam a
>> day I am getting hits:
>>
>> Ham:   232
>> Spam:  113
>> (thats a total count since 3 May)
>>
>> EmailBL.cf:
>>   Rule Name Score Ham   Spam   %of Ham   %of Spam
>>
>> ---
>>   EMAILBL_TEST_LEM  0.50  0  3 0.00%  2.65%
>>
>> ---
>>   OVERALL 0  3 0.00%  2.65%
> 
> It's still some hits for such low volume sample. ;) Everyone keep in mind
> that the current domain list is pretty small (though they should be some of
> the most popular ones). There's been few added, so you might want to download
> it daily:
> 

Slightly different set of stats from me as I'm not running it in SA but
rejecting at SMTP time.  Here's the stats from the last two hours when I
enabled the querying of the emailbl:

214-2.0.0 080 mail-bl-mail=655 1.73%

That's 1.73% of every MAIL FROM: seen by this machine.  As you would
expect I notice a high correlation of these are from hosts already on
client IP DNSBLs.

214-2.0.0 161 mail-bl-hdr=48 0.96%
214-2.0.0 162 mail-bl-body=10 0.20%

Those percentages are out of the total number of messages input.

No FPs here as expected.

Regards,
Steve.


Re: EmailBl Stats

2009-05-18 Thread Art Greenberg
I installed the plugin last Tuesday. As of this morning (using the 
original domain list):


Total Messages Processed: 2933
Number identified as spam: 2464
Total number tagged by EMAILBL: 7
Number of FNs tagged by EMAILBL: 2

The two FNs scored a 3. So if EMAILBL had enough weight, SA would have 
counted two FNs as spam. Both messages came from the same source.


I just installed the updated domain list.

This is a small personal mail server for my wife and me - not an ISP. 
Yeah, we get a lot of spam. And SA is doing a really great job with the 
vast majority of it. I do average maybe 2 or 3 FNs a day, and so far, no 
FPs since installing SA about 3 months ago. I've been pretty happy with SA 
as-is, especially given the FP rate. But I am curious to see what can be 
done to drive down the FN rate without bringing up the FP rate - and so 
far, EMAILBL seems like a small step in the correct direction.


--
Art Greenberg
a...@eclipse.net


Re: EmailBl Stats

2009-05-18 Thread DAve

Henrik K wrote:

On Sat, May 16, 2009 at 08:25:58AM -0500, Chris wrote:

Started running the plug-in Thursday and though I don't get much spam a
day I am getting hits:

Ham:   232
Spam:  113
(thats a total count since 3 May)

EmailBL.cf:
  Rule Name Score Ham   Spam   %of Ham   %of Spam

---
  EMAILBL_TEST_LEM  0.50  0  3 0.00%  2.65%

---
  OVERALL 0  3 0.00%  2.65%


It's still some hits for such low volume sample. ;) Everyone keep in mind
that the current domain list is pretty small (though they should be some of
the most popular ones). There's been few added, so you might want to download
it daily:

http://sa.hege.li/emailbl_lemfreemail.cf


I will see about the update, for now the last five days stats are as 
follows.


Total mail through SA = 208,498
Total spam messages tagged with EMAILBL = 1471
Total non spam messages tagged with EMAILBL = 128
FP seen = 0
(fully %80 of our traffic never gets to SA)

DAve

--
"Posterity, you will know how much it cost the present generation to
preserve your freedom.  I hope you will make good use of it.  If you
do not, I shall repent in heaven that ever I took half the pains to
preserve it." John Quincy Adams

http://appleseedinfo.org



Re: EmailBl Stats

2009-05-16 Thread Henrik K
On Sat, May 16, 2009 at 08:25:58AM -0500, Chris wrote:
> Started running the plug-in Thursday and though I don't get much spam a
> day I am getting hits:
> 
> Ham:   232
> Spam:  113
> (thats a total count since 3 May)
> 
> EmailBL.cf:
>   Rule Name Score Ham   Spam   %of Ham   %of Spam
> 
> ---
>   EMAILBL_TEST_LEM  0.50  0  3 0.00%  2.65%
> 
> ---
>   OVERALL 0  3 0.00%  2.65%

It's still some hits for such low volume sample. ;) Everyone keep in mind
that the current domain list is pretty small (though they should be some of
the most popular ones). There's been few added, so you might want to download
it daily:

http://sa.hege.li/emailbl_lemfreemail.cf

Cheers,
Henrik


EmailBl Stats

2009-05-16 Thread Chris
Started running the plug-in Thursday and though I don't get much spam a
day I am getting hits:

Ham:   232
Spam:  113
(thats a total count since 3 May)

EmailBL.cf:
  Rule Name Score Ham   Spam   %of Ham   %of Spam

---
  EMAILBL_TEST_LEM  0.50  0  3 0.00%  2.65%

---
  OVERALL 0  3 0.00%  2.65%

-- 
KeyID 0xE372A7DA98E6705C



signature.asc
Description: This is a digitally signed message part


Re: SA rules stats (Was: SARE false positives on MY_CID_* rules)

2009-01-31 Thread Chris
On Thursday 29 January 2009 23:33:49 Rajkumar S wrote:
> 2009/1/30 Stefan Jakobs 
>
> > After activating the rule I haven't seen any more FP. But that doesn't
> > mean much. Here are my stats from yesterday:
> >
> >  Rank Hits% Msgs   % Spam% Ham  Score Rule
> >   --   ---  - 
> >  3472 0.01%0.06%0.22%   1.46 MY_CID_AND_ARIAL2
> >  3711 0.01%0.03%0.02%   1.54 MY_CID_AND_STYLE
> > 0 0.01%0.00%0.02%   1.58 MY_CID_ARIAL_STYLE
>
> Hi,
>
> How did you generate this stats ?
>
> raj
There are two scripts I run, one being sastats and the other being sa-addon 
stats:

Email:       34  Autolearn:     0  AvgScore:   3.88  AvgScanTime:  9.85 sec
Spam:        13  Autolearn:     0  AvgScore:  18.15  AvgScanTime:  7.89 sec
Ham:         21  Autolearn:     0  AvgScore:  -4.95  AvgScanTime: 11.06 sec

Time Spent Running SA:         0.09 hours
Time Spent Processing Spam:    0.03 hours
Time Spent Processing Ham:     0.06 hours

TOP SPAM RULES FIRED
--
RANKRULE NAME               COUNT  %OFMAIL %OFSPAM  %OFHAM        
--
   1SAGREY                     12    35.29   92.31    0.00
   2HTML_MESSAGE               10    64.71   76.92   57.14
   3DCC_CHECK_NEGATIVE          9    58.82   69.23   52.38

That was the sastats output, the sa-addon output is below:

Total: 247
Ham:   182
Spam:  65

FreeMail.cf:
  Rule Name                     Score     Ham   Spam   %of Ham   %of Spam
  ---
  FREEMAIL_REPLYTO               2.00      0     16     0.00%     24.62%
  FREEMAIL_FROM                  1.00      5     23     2.75%     35.38%
  ---
  OVERALL                                  5     23     2.75%     35.38%

clamav.cf:
  Rule Name                     Score     Ham   Spam   %of Ham   %of Spam
  ---
  CLAMAV                        10.00      2       19      1.10%    29.23%
  ---
  OVERALL                                  2     19     1.10%     29.23%

And so on

-- 
KeyID 0xE372A7DA98E6705C



signature.asc
Description: This is a digitally signed message part.


Re: SA rules stats (Was: SARE false positives on MY_CID_* rules)

2009-01-31 Thread Stefan Jakobs
On Freitag, 30. Januar 2009 06:33:49 Rajkumar S wrote:
> > After activating the rule I haven't seen any more FP. But that doesn't
> > mean much. Here are my stats from yesterday:
> >
> >  Rank Hits% Msgs   % Spam% Ham  Score Rule
> >   --   ---  - 
> >  3472 0.01%0.06%0.22%   1.46 MY_CID_AND_ARIAL2
> >  3711 0.01%0.03%0.02%   1.54 MY_CID_AND_STYLE
> > 0 0.01%0.00%0.02%   1.58 MY_CID_ARIAL_STYLE
>
> Hi,
>
> How did you generate this stats ?

I use amavisd-new with spamassassin. So I used amavisd-logwatch to generate 
these stats. See: http://www.mikecappella.com/logwatch/

> raj

Greetings
Stefan



signature.asc
Description: This is a digitally signed message part.


SA rules stats (Was: SARE false positives on MY_CID_* rules)

2009-01-29 Thread Rajkumar S
2009/1/30 Stefan Jakobs 
> After activating the rule I haven't seen any more FP. But that doesn't mean
> much. Here are my stats from yesterday:
>
>  Rank Hits% Msgs   % Spam% Ham  Score Rule
>   --   ---  - 
>  3472 0.01%0.06%0.22%   1.46 MY_CID_AND_ARIAL2
>  3711 0.01%0.03%0.02%   1.54 MY_CID_AND_STYLE
> 0 0.01%0.00%0.02%   1.58 MY_CID_ARIAL_STYLE

Hi,

How did you generate this stats ?

raj


Re: Not a reply: spamassassin stats (was Re: Tuning the bayes-system?)

2008-10-21 Thread Heinrich Christian Peters
Moin,

Koopmann, Jan-Peter schrieb:
> can you share your new script with the MailScanner changes with us?

of cause I can... But the script will only work with German reports [1],
 you have change it. I am no perl-guru, so changes are welcome!

You can find the script here:



[1]:
> X-heinrich-peters.zz-MailScanner-SpamCheck: not spam,
>   SpamAssassin (nicht zwischen gespeichert, Wertung=-5.563,
>   benoetigt 5, autolearn=not spam, AWL -0.66, BAYES_00 -4.90,
>   NO_RELAYS -0.00)



RE: Re: Not a reply: spamassassin stats (was Re: Tuning the bayes-system?)

2008-10-21 Thread Koopmann, Jan-Peter
Hi,

can you share your new script with the MailScanner changes with us?

Kind regards,
  JP



Re: Not a reply: spamassassin stats (was Re: Tuning the bayes-system?)

2008-10-21 Thread Heinrich Christian Peters
Hello Mathias,

I am useing a variant of the sa-stats script:
<http://www.rulesemporium.com/programs/sa-stats-1.0.txt>

I had to change some things to get it work with my MailScanner-setup.

Bye,
Heiner

Mathias Homann schrieb:
> Am Dienstag 21 Oktober 2008 schrieb Heinrich Christian Peters:
> 
>>  [... my SpamAssassin Statistics ...]
> 
> What are you using to generate those stats?
> I'd like to have that on my server as well.
> 
> 
> bye,
> MH
> 
> 



Not a reply: spamassassin stats (was Re: Tuning the bayes-system?)

2008-10-21 Thread Mathias Homann
Am Dienstag 21 Oktober 2008 schrieb Heinrich Christian Peters:

> Email: 3870  Autolearn:  3575  Cached:   126  AvgScore:  29.35
> Spam:  3562  Autolearn:  3411  Cached:   113  AvgScore:  32.22
> Ham:308  Autolearn:   164  Cached:13  AvgScore:  -3.81
>
> TOP SPAM RULES FIRED(50/432)
> ===
>= RANKRULE NAME  SCORE   COUNT %OFMAIL %OFSPAM 
> %OFHAM   BAYES
> ---
>- 18BAYES_995.401970   50.90   55.310.00
>  100.00 27BAYES_500.00 801   21.16   22.49   
> 5.84   97.80 41BAYES_802.00 2596.697.27
>0.00  100.00 43BAYES_953.00 2486.41   
> 6.960.00  100.00
>
> TOP HAM RULES FIRED (50/78)
> ===
>= RANKRULE NAME  SCORE   COUNT %OFMAIL %OFSPAM 
> %OFHAM   BAYES
> ---
>-
>
>2BAYES_00   -4.90 2538.011.60   82.14  
> 81.61 12BAYES_500.00  18   21.16   22.49   
> 5.842.20 21BAYES_40   -0.18   80.930.79
>2.60   22.22 32BAYES_05   -1.11   40.62   
> 0.561.30   16.67 43BAYES_20   -0.74   30.65
>0.620.97   12.00
>
> conf:
> bayes_expiry_max_db_size 150
> bayes_auto_learn_threshold_spam 7.5

What are you using to generate those stats?
I'd like to have that on my server as well.


bye,
MH


-- 
gpg key fingerprint: 5F64 4C92 9B77 DE37 D184  C5F9 B013 44E7 27BD 
763C


Giving Back--A stats script I wrote

2008-08-02 Thread Skip
This may be kinda simple for you gurus out there, in which case I 
welcome your feedback and suggestions to make this better.  But if 
anyone finds this useful...great!


I wanted a stats tool that would tell me what rules were hit on the 
most.  Which ones ONLY trigger on spam and which ones ONLY trigger on 
HAM?  I wanted to know what percentage of my HAM was whitelisted.  Do I 
have my rule scores set high or low enough and do I have the required 
score for the SPAM threshold at the right place?  I wanted something 
that was flexible and powerful.  So I thought about ways to get my 
spamassassin data into mysql.  Look at this screenshot and you'll get 
the idea:


http://pelorus.org/pictures/mailstats.gif

Obviously, with that type of granularity, I could generate any kind of 
report I wanted. 

The way I do it is I generate a few custom headers in procmail to make 
things easier, and I have a couple of special SA headers added, again, 
to make things easier.  Then I pipe a carbon copy of each email through 
this bash script which parses it and puts all the data into mysql.  I 
just finished it today, so I don't have any pretty charts or anything 
yet, but I do think it will meet my needs.


I did look at some of the other data collection utilities out there, but 
I didn't see any that were quite this flexible, if I do say so myself.  
Perhaps I am mistaken and there is one (or more) that can do what this 
does and more.


Here's the script, along with many (helpful, I hope) comments.
http://pastebin.com/f743e7daa

Like I said, if any of you smart guys out there see ways to improve 
this, I sure would appreciate the feedback.


Thanks.

Skip

--
Get my PGP Public key here:
http://pelorus.org/[EMAIL PROTECTED]



Re: rDNS none in stats with IPv6

2008-05-30 Thread Steve Bertrand

SpamAssassin doesn't perform DNS lookups on the Received headers if
at all possible -- it's assumed that your MTA will do that in advance.


Thanks for that. I found this out late last night, and I believe I've 
got the issue resolved.


Regards,

Steve


Re: rDNS none in stats with IPv6

2008-05-30 Thread Justin Mason

Steve Bertrand writes:
> I've added debugging code to new_dns_packet() and bgsend() 
> (DnsResolver.pm) to print out $host, $type and $class to a log file.
> 
> What I found is that the mapped address entries are not even seen by 
> DnsResolver.pm at all, hence, there is no DNS lookup even attempted on them.
> 
> I'm off to find out where exactly the evaluation/gathering of the IP 
> addresses takes place, and try to design a regex that will take the 
> ::: into consideration properly.
> 
> What I'd like to have happen is the mapped address sent merrily along 
> all the way to the system resolver, then have the system resolver do 
> what needs to be done.
> 
> Am I taking the right approach here? Or should I have the IPv4 address 
> stripped out of the v6 mapped address prior to pushing it through the 
> Perl resolver gateways?

SpamAssassin doesn't perform DNS lookups on the Received headers if
at all possible -- it's assumed that your MTA will do that in advance.

--j.


Re: rDNS none in stats with IPv6

2008-05-29 Thread Greg Troxel
First, I would advise you not to use mapped addresses unless you really
need to use them.  On BSD, there's a sysctl to control whether v4
connections will match v6 sockets:

 net.inet6.ip6.v6only = 1

Best practice seems to be to have daemons open a v4 and v6 socket
separately, and avoid mapped addresses.  This will get you out of
inverse resolving v6 ipv4-mapped addresses, and get you out of teaching
SA to extract v4 addresses for checks from the mapped addresses.

Then, there's the issue about getting your MTA to resolve v6 addresses.

  To be honest, I think that the work should focus on fixing the
  resolver (or whatever calls the resolver) to extract the IPv4 address
  out of the mapped address, instead of eliminating the mapped address
  entirely. There are legitimate needs to use mapped addresses.

Well, you are of course welcome to that.  I think it will prove harder
than avoiding mapped addresses.


Re: rDNS none in stats with IPv6

2008-05-29 Thread Steve Bertrand

Steve Bertrand wrote:
I've added debugging code to new_dns_packet() and bgsend() 
(DnsResolver.pm) to print out $host, $type and $class to a log file.


What I found is that the mapped address entries are not even seen by 
DnsResolver.pm at all, hence, there is no DNS lookup even attempted on 
them.


Hmmmwhat's worse that I just found out is that *NO* IPv6 addresses 
are being seen by DnsResolver.pm at all.


Steve


Re: rDNS none in stats with IPv6

2008-05-29 Thread Steve Bertrand
I've added debugging code to new_dns_packet() and bgsend() 
(DnsResolver.pm) to print out $host, $type and $class to a log file.


What I found is that the mapped address entries are not even seen by 
DnsResolver.pm at all, hence, there is no DNS lookup even attempted on them.


I'm off to find out where exactly the evaluation/gathering of the IP 
addresses takes place, and try to design a regex that will take the 
::: into consideration properly.


What I'd like to have happen is the mapped address sent merrily along 
all the way to the system resolver, then have the system resolver do 
what needs to be done.


Am I taking the right approach here? Or should I have the IPv4 address 
stripped out of the v6 mapped address prior to pushing it through the 
Perl resolver gateways?


Steve





Re: rDNS none in stats with IPv6

2008-05-29 Thread Steve Bertrand


Hmmm...just out of curiosity, what is the first entry below used for, if 
Resolver.pm is used for header checks?


pearl# locate Resolver.pm

/usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/DnsResolver.pm
/usr/local/lib/perl5/site_perl/5.8.8/mach/Net/DNS/Resolver.pm


...nevermind, sorry for the noise.

Steve


Re: rDNS none in stats with IPv6

2008-05-29 Thread Steve Bertrand
Received: from unknown (HELO mail.apache.org) (:::140.211.11.2)  
by pearl.ibctech.ca with SMTP; 28 May 2008 09:13:00 -


Can someone inform me if this is an SA thing, and if so, where to 
begin looking/testing with the source to correct this issue?



The Received headers are parsed in Received.pm.


Hmmm...just out of curiosity, what is the first entry below used for, if 
Resolver.pm is used for header checks?


pearl# locate Resolver.pm

/usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/DnsResolver.pm
/usr/local/lib/perl5/site_perl/5.8.8/mach/Net/DNS/Resolver.pm

Steve


Re: rDNS none in stats with IPv6

2008-05-29 Thread Steve Bertrand

Greg Troxel wrote:

  In my SA stats, the majority (+90%) of email inbound is classified as
  rdns_none.

  I have a suspicion that this is due to the IPv6-IPv4 mapped address
  being written into the headers when I am speaking to a non-native IPv6
  MTA:

  Received: from unknown (HELO mail.apache.org) (:::140.211.11.2)
  by pearl.ibctech.ca with SMTP; 28 May 2008 09:13:00 -




(I presume you are trying to make this server IPv6 only instead of dual
stack.  


...well, not intentionally. My intentions were/are to make this a fully 
dual-stacked machine that hosts my personal domain that is my first 
fully IPv6 compliant machine that I've configured.



When my machine had a globally routable v6 address I got some
mail over v6 and some over v4, but didn't used mapped addresses.)


Unfortunately, I'm not intently using mapped addresses. :)

I've got a hacked version of Qmail that uses Simscan to fire SA (at 
least I believe this is how it works).


I'll need to go through the Qmail sources to find out where it's writing 
these mapped addresses.


To be honest, I think that the work should focus on fixing the resolver 
(or whatever calls the resolver) to extract the IPv4 address out of the 
mapped address, instead of eliminating the mapped address entirely. 
There are legitimate needs to use mapped addresses.



It seems that your SMTP listener is not correctly doing reverse dns
lookups of mapped addresses,


How can I identify *exactly* what is my SMTP 'listener', and how DNS is 
called, and by what?



and I'm not sure what the right fix is.
Either the SMTP code should notice the mapped address, pull out the v4
address, and look it up, or the resolver should do this automaticall


I agree. I personally think that the mapped address should remain in the 
header however. Although I've never tested sending to a mapped address 
directly, I'll have to...it would be interesting to see how a return to 
a mapped address ends up if my IPv4 BGP peers go down, but my IPv6 stays up.



(generally pretty hard core about this sort of
thing), 


Nice to meet you, I am very much as well (particularly IP and routing :)


"dig -x :::140.211.11.2" returns NXDOMAIN on a query of

;2.0.b.0.3.d.c.8.f.f.f.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa. IN 
PTR

so I'd guess that it's not a normal expectation for a resolver to
extract the mapped address.


No, I see the exact same thing via FBSD, but seems right. I've been 
going over the resolver code itself lately, so I'll have a look. Perhaps 
it could be fixed right there, and then the SMTP engine (or anything 
else that relies on DNS) could stay the same.



After the lookup issue is fixed, the received header would have the hostname.


This is why I didn't know if it were appropriate for the SA list... 
essentially, I would like to follow up on where in my infrastructure 
this is broken :)


Just think, I set out to set up a simple mail server on IPv6. While 
doing so, I've written more patches for software in the last week than I 
have my whole life...and I'm not even a programmer ;)


Thanks for the input.

Steve





Re: rDNS none in stats with IPv6

2008-05-28 Thread Steve Bertrand

Greg Troxel wrote:

  In my SA stats, the majority (+90%) of email inbound is classified as
  rdns_none.



(I presume you are trying to make this server IPv6 only instead of dual
stack.  When my machine had a globally routable v6 address I got some
mail over v6 and some over v4, but didn't used mapped addresses.)


When I get a few more minutes, I will go over the reply again, and reply 
properly.


I couldn't believe the response (on and off list) regarding help with 
IPv6 issues and issues in general.


I think that I'll be happy here ;)

Steve





Re: rDNS none in stats with IPv6

2008-05-28 Thread Greg Troxel
  In my SA stats, the majority (+90%) of email inbound is classified as
  rdns_none.

  I have a suspicion that this is due to the IPv6-IPv4 mapped address
  being written into the headers when I am speaking to a non-native IPv6
  MTA:

  Received: from unknown (HELO mail.apache.org) (:::140.211.11.2)
  by pearl.ibctech.ca with SMTP; 28 May 2008 09:13:00 -

(I presume you are trying to make this server IPv6 only instead of dual
stack.  When my machine had a globally routable v6 address I got some
mail over v6 and some over v4, but didn't used mapped addresses.)

It seems that your SMTP listener is not correctly doing reverse dns
lookups of mapped addresses, and I'm not sure what the right fix is.
Either the SMTP code should notice the mapped address, pull out the v4
address, and look it up, or the resolver should do this automatically.

On my NetBSD 4 system (generally pretty hard core about this sort of
thing), "dig -x :::140.211.11.2" returns NXDOMAIN on a query of

;2.0.b.0.3.d.c.8.f.f.f.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa. IN 
PTR

so I'd guess that it's not a normal expectation for a resolver to
extract the mapped address.

After the lookup issue is fixed, the received header would have the hostname.

>From looking at Received.pm, I don't see that SA is trying to do DNS
lookups; rnds_none seems to be about the MTA not having succeeded at
rdns lookup, not SA checking it later.  But if SA does look it up,
teaching it about mapped addresses might be needed too.




Re: rDNS none in stats with IPv6

2008-05-28 Thread SM

Hi Steve,
At 06:28 28-05-2008, Steve Bertrand wrote:

This may not be the appropriate list, but I'm hoping someone can help me.


It is the appropriate list.

I have an email server based on Matt Simerson's mail toaster 
(http://www.tnpi.biz/internet/mail/toaster/) that I've managed to 
get IPv6 compliant.


However, I'm having a very hard time determining exactly where the 
DNS checks are performed, and how to correct an issue.


In my SA stats, the majority (+90%) of email inbound is classified 
as rdns_none.


I have a suspicion that this is due to the IPv6-IPv4 mapped address 
being written into the headers when I am speaking to a non-native IPv6 MTA:


Received: from unknown (HELO mail.apache.org) 
(:::140.211.11.2)  by pearl.ibctech.ca with SMTP; 28 May 2008 
09:13:00 -


Can someone inform me if this is an SA thing, and if so, where to 
begin looking/testing with the source to correct this issue?


According to your header, there is no reverse DNS for that mail server.

If it is within a part of SpamAssassin, I will gladly submit any 
patches that identify/rectify my problem.


The Received headers are parsed in Received.pm.

Regards,
-sm 



rDNS none in stats with IPv6

2008-05-28 Thread Steve Bertrand

Hi everyone,

This may not be the appropriate list, but I'm hoping someone can help me.

I have an email server based on Matt Simerson's mail toaster 
(http://www.tnpi.biz/internet/mail/toaster/) that I've managed to get 
IPv6 compliant.


However, I'm having a very hard time determining exactly where the DNS 
checks are performed, and how to correct an issue.


In my SA stats, the majority (+90%) of email inbound is classified as 
rdns_none.


I have a suspicion that this is due to the IPv6-IPv4 mapped address 
being written into the headers when I am speaking to a non-native IPv6 MTA:


Received: from unknown (HELO mail.apache.org) (:::140.211.11.2)  by 
pearl.ibctech.ca with SMTP; 28 May 2008 09:13:00 -


Can someone inform me if this is an SA thing, and if so, where to begin 
looking/testing with the source to correct this issue?


If it is within a part of SpamAssassin, I will gladly submit any patches 
that identify/rectify my problem.


Thanks, and regards,

Steve






Re: Fired rules stats understanding

2008-01-24 Thread Sébastien AVELINE

Matt Kettler a écrit :

Sébastien AVELINE wrote:

Hello,

You will find my top rules fired with spamassassin.
I have spamassassin on several boxes, each have his own bayes_db 
files, I use razor, dcc_check, uribl, bayes  We have hundreds of 
thousand messages per day.
In my top rules for spam you will see a lot of "collaborative rules" 
like razor,uribl,dcc_check. I wonder why there isn't more heuristic 
and bayesian rules in my top. Do you think that my stats seem to be 
"normal" or is there something wrong ? Any suggestions are welcome.


It's really absurd that RDNS_NONE is firing off on 99.6% of email.

Do you not have RDNS for your own network, or is it generating invalid 
Recieved: headers?


Ahh, yeah, it looks like your own network lacks RDNS:

Received: from unknown (HELO ?192.168.0.213?)
([EMAIL PROTECTED]@82.235.12.159) by smtpp.alinto.net with SMTP; Thu,
24 Jan 2008 09:30:20 +


If you've got a local nameserver, you might want to generate an 
in-addr.arpa zone for the 192.168.0.* network to fix that.


As for the bayes, that doesn't surprise me. There's 10 different bayes 
rules, and while I'd expect that collectively they add up to most of 
your mail, it's not surprising that they're not individually scoring 
high. It's a little surprising BAYES_50 is doing so well compared to 
BAYES_99.. with the chi-squared combining I'd expect BAYES_99 to edge 
it out slightly. Are you doing any manual training? what's your 
"sa-learn --dump magic" look like?


Local address is from my office where I submit my mail to my 
mailservers. I think RDNS_NONE isn't the main worry. Unfortunately I 
don't use sa-learn to feed my bayes, I rely on high number of mails that 
come into my servers.

Is it really efficient to train the bayes manualy ?
Here you can see the result from sa-learn --dump magic:

0.000  0  3  0  non-token data: bayes db version
0.000  03803618  0  non-token data: nspam
0.000  0 862246  0  non-token data: nham
0.000  0 496111  0  non-token data: ntokens
0.000  0 1181735997  0  non-token data: oldest atime
0.000  0 1198170104  0  non-token data: newest atime
0.000  0 1181805393  0  non-token data: last journal 
sync atime

0.000  0 1181779437  0  non-token data: last expiry atime
0.000  0  43200  0  non-token data: last expire 
atime delta
0.000  0 476160  0  non-token data: last expire 
reduction count





Re: Fired rules stats understanding

2008-01-24 Thread Matt Kettler

Sébastien AVELINE wrote:

Hello,

You will find my top rules fired with spamassassin.
I have spamassassin on several boxes, each have his own bayes_db 
files, I use razor, dcc_check, uribl, bayes  We have hundreds of 
thousand messages per day.
In my top rules for spam you will see a lot of "collaborative rules" 
like razor,uribl,dcc_check. I wonder why there isn't more heuristic 
and bayesian rules in my top. Do you think that my stats seem to be 
"normal" or is there something wrong ? Any suggestions are welcome.


It's really absurd that RDNS_NONE is firing off on 99.6% of email.

Do you not have RDNS for your own network, or is it generating invalid 
Recieved: headers?


Ahh, yeah, it looks like your own network lacks RDNS:

Received: from unknown (HELO ?192.168.0.213?)
([EMAIL PROTECTED]@82.235.12.159) by smtpp.alinto.net with SMTP; Thu,
24 Jan 2008 09:30:20 +


If you've got a local nameserver, you might want to generate an 
in-addr.arpa zone for the 192.168.0.* network to fix that.


As for the bayes, that doesn't surprise me. There's 10 different bayes 
rules, and while I'd expect that collectively they add up to most of 
your mail, it's not surprising that they're not individually scoring 
high. It's a little surprising BAYES_50 is doing so well compared to 
BAYES_99.. with the chi-squared combining I'd expect BAYES_99 to edge it 
out slightly. Are you doing any manual training? what's your "sa-learn 
--dump magic" look like?




Fired rules stats understanding

2008-01-24 Thread Sébastien AVELINE




Hello,

You will find my top rules fired with spamassassin.
I have spamassassin on several boxes, each have his own bayes_db files,
I use razor, dcc_check, uribl, bayes  We have hundreds of thousand
messages per day.
In my top rules for spam you will see a lot of "collaborative rules"
like razor,uribl,dcc_check. I wonder why there isn't more heuristic and
bayesian rules in my top. Do you think that my stats seem to be
"normal" or is there something wrong ? Any suggestions are welcome.

Here my top rules:

TOP SPAM RULES FIRED
--
RANK    RULE NAME   COUNT  %OFMAIL %OFSPAM 
%OFHAM    
--
   1    RDNS_NONE   48417    99.60   99.82   99.41
   2    RAZOR2_CHECK    42113    42.50   86.82    2.88
   3    RAZOR2_CF_RANGE_51_100  41657    41.46   85.88    1.75
   4    URIBL_BLACK 41376    41.43   85.30    2.22
   5    RAZOR2_CF_RANGE_E8_51_100   39016    38.41   80.44    0.85
   6    URIBL_JP_SURBL  38221    37.21   78.80    0.05
   7    URIBL_OB_SURBL  32588    32.22   67.18    0.97
   8    URIBL_SC_SURBL  30849    30.03   63.60    0.02
   9    DCC_CHECK   27472    28.92   56.64    4.14
  10    URIBL_AB_SURBL  26134    25.43   53.88    0.00
  11    HTML_MESSAGE    25531    60.93   52.63   68.35
  12    URIBL_WS_SURBL  23317    22.94   48.07    0.48
  13    DIGEST_MULTIPLE 23267    22.74   47.97    0.20
  14    URIBL_RHS_DOB   17797    17.42   36.69    0.20
  15    RAZOR2_CF_RANGE_E4_51_100   16500    16.55   34.02    0.94
  16    BAYES_50    13772    14.69   28.39    2.44
  17    RCVD_IN_BL_SPAMCOP_NET  13594    13.48   28.03    0.48
  18    BAYES_99    11330    11.06   23.36    0.07
  19    FORGED_MUA_OUTLOOK   9043 8.86   18.64    0.11
  20    STOX_REPLY_TYPE  8199 8.21   16.90    0.43
--

TOP HAM RULES FIRED
--
RANK    RULE NAME   COUNT  %OFMAIL %OFSPAM 
%OFHAM    
--
   1    RDNS_NONE   53945    99.60   99.82   99.41
   2    BAYES_00    43583    45.21    5.94   80.31
   3    HTML_MESSAGE    37089    60.93   52.63   68.35
   4    MIME_HTML_ONLY  10131    16.80   14.71   18.67
   5    MIME_QP_LONG_LINE    4754 5.78    2.45    8.76
   6    URIBL_GREY   3498 5.88    5.26    6.45
   7    HTML_IMAGE_RATIO_02  3053 3.82    1.79    5.63
   8    SUBJ_ALL_CAPS    2796 3.26    1.14    5.15
   9    SUBJECT_NEEDS_ENCODING   2520 2.77    0.68    4.64
  10    DCC_CHECK    2248    28.92   56.64    4.14
  11    MSGID_MULTIPLE_AT    2212 2.16    0.02    4.08
  12    INVALID_DATE 2130 4.26    4.63    3.93
  13    HTML_MIME_NO_HTML_TAG    1889 2.41    1.21    3.48
  14    MPART_ALT_DIFF   1744 2.66    2.04    3.21
  15    MIME_HTML_MOSTLY 1580 1.84    0.65    2.91
  16    RAZOR2_CHECK 1564    42.50   86.82    2.88
  17    UNPARSEABLE_RELAY    1563 1.78    0.55    2.88
  18    EXTRA_MPART_TYPE 1557 2.04    1.11    2.87
  19    HTML_IMAGE_RATIO_04  1455 2.17    1.59    2.68
  20    BAYES_50 1325    14.69   28.39    2.44
--

Thanks in advance.

Seb.




Re: Who can tell me where the latest sa-stats can be found.

2007-07-18 Thread Chris
On Monday 16 July 2007 9:47 pm, Dallas Engelken wrote:

>
>
> I havent touched them for a while and havent checked if v1.03 even works
> with SA 3.2.   If something needs to be done, let me know.

1.03 is working just fine here Dallas w/SA3.2.1

-- 
Chris
KeyID 0xE372A7DA98E6705C


pgpcaCcsXAAu1.pgp
Description: PGP signature


Re: Who can tell me where the latest sa-stats can be found.

2007-07-16 Thread Dallas Engelken

Steven W. Orr wrote:
I used to use it but it's old and has bugs. I recent;y found out that 
it's *not* part of the sa distro. Is this still supported and if so, 
where do I get it?


I looked around and found hugely conflicting version info. e.g., 
version 0.93 seems to support sa-3.1.x but version 1.03 seems to be 
for sa-3.0.
(BTW, they both seem to be dated 2007-01-30 at 
http://rulesemporium.com/programs/

)


what the hell are you reading?

http://rulesemporium.com/programs/sa-stats-1.0.txt   =  v1.03  is the 
latest, for SA 3.1


# version: 1.03
# author:  Dallas Engelken <[EMAIL PROTECTED]>
# desc:Generates Top Spam/Ham Rules fired for SA 3.1.x installations.


http://rulesemporium.com/programs/sa-stats.txt = v0.93, for  SA 3.0

# version: 0.93
# author:  Dallas Engelken <[EMAIL PROTECTED]>
# desc:Generates Top Spam/Ham Rules fired for SA 3.x installations.


I havent touched them for a while and havent checked if v1.03 even works 
with SA 3.2.   If something needs to be done, let me know.


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Who can tell me where the latest sa-stats can be found.

2007-07-16 Thread Steven W. Orr
I used to use it but it's old and has bugs. I recent;y found out that it's 
*not* part of the sa distro. Is this still supported and if so, where do I 
get it?


I looked around and found hugely conflicting version info. e.g., version 
0.93 seems to support sa-3.1.x but version 1.03 seems to be for sa-3.0.
(BTW, they both seem to be dated 2007-01-30 at 
http://rulesemporium.com/programs/

)

Then I found a version 1.17 at 
http://apthorpe.cynistar.net/code/sa-contrib/sa-stats.html


so I'm pretty confused.

TIA

--
Time flies like the wind. Fruit flies like a banana. Stranger things have  .0.
happened but none stranger than this. Does your driver's license say Organ ..0
Donor?Black holes are where God divided by zero. Listen to me! We are all- 000
individuals! What if this weren't a hypothetical question?
steveo at syslang.net


Re: sa-stats and no spamd logs.

2007-05-10 Thread Luis Hernán Otegui

Hi, try Amavis Logwatch, by Mike Capella. It's working great here, and
you could run it from logwatch, or standalone:

http://www.mikecappella.com/logwatch

It's pretty straightforward to install and run, and it gives you lots
of info about Amavis performance, as well as antivirus & antispam
statistics...


Luix

2007/5/10, mbano <[EMAIL PROTECTED]>:


HI,

is there a way to extract statistics as with sa-stats from
spamassassin, even if spamd is not used (so no logs spamd format),
and it is used spamassassin from amavis-new instead.
anybody have a similar need?

Or .. logs in sql and php...

thanks in advance
--
View this message in context: 
http://www.nabble.com/sa-stats-and-no-spamd-logs.-tf3722909.html#a10417475
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.





--
-
GNU-GPL: "May The Source Be With You...
Linux Registered User #448382.
-


sa-stats and no spamd logs.

2007-05-10 Thread mbano

HI,

is there a way to extract statistics as with sa-stats from 
spamassassin, even if spamd is not used (so no logs spamd format),
and it is used spamassassin from amavis-new instead.
anybody have a similar need?

Or .. logs in sql and php... 

thanks in advance
-- 
View this message in context: 
http://www.nabble.com/sa-stats-and-no-spamd-logs.-tf3722909.html#a10417475
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: sa-stats and Spamtagging

2007-02-13 Thread LuKreme

On 13-Feb-2007, at 09:08, Alexis Manning wrote:

[EMAIL PROTECTED] says...

Am I worrying over nothing?  I do seem to get spam only on those
accounts for which greylisting is inactive, but on those I get a LOT
that SA fails to tag, including just about every one of those image
spams with the 2K or so of seemingly randomish text in the plain/text
portion.


Have you considered FuzzyOCR or ImageInfo?


No, I haven't really looked into it.  I did note that the version in  
ports is 2.3 and that version is no longer maintained. Since  
everything SA related is managed in my ports tree, I am loath to  
install FuzzyOCR separately.  I think that's as far as I got last  
time.  Also, FuzzyOCR seems to have a lot of dependencies, which  
makes  non-ports install even less desirable.


I went ahead and tried to install ImageInfo from SARE, so we'll see  
how that goes.  I get a lot of warnings on --lint though:


[18402] dbg: plugin: loading Mail::SpamAssassin::Plugin::DKIM from @INC
[18402] warn: plugin: failed to parse plugin (from @INC): Can't  
locate Mail/DKIM.pm in @INC (@INC contains: /usr/local/lib/perl5/ 
site_perl/5.8.8 /usr/local/lib/perl5/5.8.8/BSDPAN /usr/local/lib/ 
perl5/site_perl/5.8.8/mach /usr/local/lib/perl5/site_perl/5.8.7 /usr/ 
local/lib/perl5/site_perl/5.8.2 /usr/local/lib/perl5/site_perl/5.6.2 / 
usr/local/lib/perl5/site_perl/5.6.1 /usr/local/lib/perl5/site_perl / 
usr/local/lib/perl5/5.8.8/mach /usr/local/lib/perl5/5.8.8) at /usr/ 
local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/Plugin/DKIM.pm line  
60.

[18402] warn: Compilation failed in require at (eval 99) line 1.
[18402] warn: plugin: failed to create instance of plugin  
Mail::SpamAssassin::Plugin::DKIM: Can't locate object method "new"  
via package "Mail::SpamAssassin::Plugin::DKIM" at (eval 100) line 1.



Without them I know that I'd be slammed by the 'buy your drugs  
here' image spams.  Obviously there's going to be a CPU hit for  
FuzzyOCR but perhaps with your greylisting the number of messages  
that it'll work on will be manageable?


I expect so, the mailserver is under a very light load.

I'll see how ImageInfo works for now.



--
There are 10 types of people in the world: Those who understand  
binary, and those who don't.





Re: sa-stats and Spamtagging

2007-02-13 Thread LuKreme

On 13-Feb-2007, at 08:39, Chris St. Pierre wrote:

This is where a user feedback look -- such as spam/ham reporting links
in your webmail client, or the equivalent training for desktop client
users -- can be really useful.


Ideally I'd like to have per-user bayes, but some of my users are  
managed through courier/mysql and I've just not gotten to the point  
of working up how to managed bates for those users, or if it's even  
possible.


I guess what I'd like to have is a IMAP mailbox created for every  
user where they can drop in spam and have bayes learn it.  I set  
something up for the non-mysql users that worked, mostly, but never  
got further than that.



--
The other cats just think he's a tosser. --Neil Gaiman




Re: sa-stats and Spamtagging

2007-02-13 Thread Alexis Manning
[EMAIL PROTECTED] says...
> Am I worrying over nothing?  I do seem to get spam only on those  
> accounts for which greylisting is inactive, but on those I get a LOT  
> that SA fails to tag, including just about every one of those image  
> spams with the 2K or so of seemingly randomish text in the plain/text  
> portion.

Have you considered FuzzyOCR or ImageInfo?  Without them I know that I'd 
be slammed by the 'buy your drugs here' image spams.  Obviously there's 
going to be a CPU hit for FuzzyOCR but perhaps with your greylisting the 
number of messages that it'll work on will be manageable?

-- A.


Re: sa-stats and Spamtagging

2007-02-13 Thread Chris St. Pierre

On Tue, 13 Feb 2007, LuKreme wrote:

Now, perhaps I am misunderstanding, but BAYES_99 is hitting on 5% of ham? and 
AWL on 35% of spam?


Keep in mind that AWL is slightly misnamed; it doesn't just whitelist,
it adjusts scores (both positively and negatively) based on previous
history.  So the fact that it's hitting on 35% of your spam is pretty
meaningless, really.

sa-stats counts something as spam that SA marks as spam.  So the fact
that BAYES_99 is hitting on 5% of ham means (roughly) that 5% of your
unmarked mail hit either only BAYES_99 or BAYES_99 and not enough
other rules to mark it as spam.  That means, respectively, that either
you need to work on training your Bayes better, or that your Bayesian
component is very well trained and that you need to turn up the scores
for BAYES_99.  The only way to know the difference is to look at the
messages that are getting tagged with BAYES_99 but are not marked as
spam. If Bayes is right about them, turn up your scoring; if not,
continue training.

This is where a user feedback look -- such as spam/ham reporting links
in your webmail client, or the equivalent training for desktop client
users -- can be really useful.

Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University

Never send mail to [EMAIL PROTECTED]



sa-stats and Spamtagging

2007-02-13 Thread LuKreme

I recently ran sa-stats (Dallas's script, not the one in SA)

Email:10373  Autolearn:  1575  AvgScore:   7.45  AvgScanTime:   
3.74 sec
Spam:  6179  Autolearn:   680  AvgScore:  12.44  AvgScanTime:   
4.03 sec
Ham:   4194  Autolearn:   895  AvgScore:   0.10  AvgScanTime:   
3.33 sec


Time Spent Running SA:10.79 hours
Time Spent Processing Spam:6.91 hours
Time Spent Processing Ham: 3.88 hours

TOP SPAM RULES FIRED
--
RANKRULE NAME   COUNT  %OFMAIL %OFSPAM  %OFHAM
--
   1HTML_MESSAGE 454974.92   73.62   76.82
   2BAYES_99 394140.06   63.785.10
   3AWL  217949.99   35.26   71.67
   4BOTNET   186618.40   30.201.03
   5URIBL_JP_SURBL   166716.15   26.980.19
--

TOP HAM RULES FIRED
--
RANKRULE NAME   COUNT  %OFMAIL %OFSPAM  %OFHAM
--
   1HTML_MESSAGE 322274.92   73.62   76.82
   2AWL  300649.99   35.26   71.67
   3BAYES_00 252225.401.83   60.13
   4MIME_HTML_ONLY   169328.26   20.04   40.37
   5FORGED_RCVD_HELO 119516.778.82   28.49
--

Now, perhaps I am misunderstanding, but BAYES_99 is hitting on 5% of  
ham? and AWL on 35% of spam?


Looking at this is looks to my, albeit untrained, eye as is something  
is quite wrong with my spam-tagging solution.


Now, to be fair, a large percentage of the incoming spam is being  
stopped by greylisting before SA ever sees it.


Am I worrying over nothing?  I do seem to get spam only on those  
accounts for which greylisting is inactive, but on those I get a LOT  
that SA fails to tag, including just about every one of those image  
spams with the 2K or so of seemingly randomish text in the plain/text  
portion.


I am running RDJ with several rules and my SA version is  
SpamAssassin-3.1.7


TRUSTED_RULESETS="TRIPWIRE EVILNUMBERS RANDOMVAL
BOGUSVIRUS SARE_ADULT SARE_FRAUD SARE_BML SARE_SPOOF
SARE_BAYES_POISON_NXM SARE_OEM SARE_RANDOM SARE_HEADER_ABUSE
SARE_SPECIFIC SARE_CODING_HTML SARE_GENLSUBJ SARE_UNSUB SARE_URI0
SARE_REDIRECT_POST300 SARE_OBFU";

and RDJ is not reporting any errors

--
#27794   ... I wonder if the really nerdy Klingons learn how  
to speak english





Help with sa-stats

2006-11-28 Thread John Tice
I am trying to install Dallas Engelken's version of sa-stats and a  
rank novice I could use some help...

http://www.rulesemporium.com/programs/sa-stats.txt

I'm on a VPS with cpanel multiple domains. I installed this into the  
cgi-bin in my domain (not the primary server domain) and it executes  
except that it contains no data. So I moved it to the server to /root/ 
public_html/cgi-bin/ but it's not found when I point my browser at  
it. Permissions 755. I'm guessing it's not in the right place or else  
I'd at least get the results page as I do when it's in the mydomain/ 
cgi-bin location. Where should I put it? Do I need to show it the  
path to the logs, and if so where  are they located?


Also, is there a different version of this included with  
spamassassin, and how to I turn it on or access it?

Thanks-


RE: SA-STATS on BSD

2006-11-09 Thread Jean-Paul Natola


-Original Message-
From: Odhiambo WASHINGTON [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 09, 2006 9:30 AM
To: Jean-Paul Natola
Cc: users@spamassassin.apache.org
Subject: Re: SA-STATS on BSD

* On 08/11/06 19:15 -0500, Jean-Paul Natola wrote:
| Hi everyone,
|  
| I've tried on apache and SARE  and bsd   sites to find the documentation on
| installing sa-stats , I have found the the actual sa-stats.pl  but I dont
| know how to go about installing it on BSD any guidance would be
appreciated.
|  
| Freebsd 5.4
| exim
| sa 3.1.7

cd /usr/ports/mail/p5-Mail-SpamAssassin
make -DWITH_TOOLS reinstall

If you install the utility called portupgrade, you can do:

portupgrade -m WITH_TOOLS=1 p5-Mail-SpamAssassin

The sa-stats.pl will then be installed in
/usr/local/share/spamassassin/tools/

HTH



-Wash

http://www.netmeister.org/news/learn2quote.html

DISCLAIMER: See http://www.wananchi.com/bms/terms.php

--
+==+
|\  _,,,---,,_ | Odhiambo Washington<[EMAIL PROTECTED]>
Zzz /,`.-'`'-.  ;-;;,_ | Wananchi Online Ltd.   www.wananchi.com
   |,4-  ) )-,_. ,\ (  `'-'| Tel: +254 20 313985-9  +254 20 313922
  '---''(_/--'  `-'\_) | GSM: +254 722 743223   +254 733 744121
+==+

Zero Defects, n.:
The result of shutting down a production line.

Thanks to WASH I got it installed, HOWEVER, I am getting zero stats, is
sa-stats designed to work ONLY with maillog?

I do not have maillog , I have /var/log/exim/mainlog

This was the syntax I used

/usr/local/share/spamassassin/tools/sa-stats.pl -s 'midnight' -e 'now' >
sa_stats.txt

And these were the results;
Report Title : SpamAssassin - Spam Statistics
Report Date  : 2006-11-09
Period Beginning : Thu Nov  9 00:00:00 2006
Period Ending: Thu Nov  9 11:57:53 2006

Reporting Period : 11.96 hrs
--

Note: 'ham' = 'nonspam'

Total spam detected:0 (   0.00%)
Total ham accepted :0 (   0.00%)
---
Total emails processed :0 (0/hr)

Average spam threshold :0.00
Average spam score :0.00
Average ham score  :0.00

Spam kbytes processed  :0   (0 kb/hr)
Ham kbytes processed   :0   (0 kb/hr)
Total kbytes processed :0   (0 kb/hr)

Spam analysis time :0 s (0 s/hr)
Ham analysis time  :0 s (0 s/hr)
Total analysis time:0 s (0 s/hr)


Statistics by Hour

Hour  Spam   Ham
----
2006-11-09 00 0 (  0%)  0 (  0%)
2006-11-09 01 0 (  0%)  0 (  0%)
2006-11-09 02 0 (  0%)  0 (  0%)
2006-11-09 03 0 (  0%)  0 (  0%)
2006-11-09 04 0 (  0%)  0 (  0%)
2006-11-09 05 0 (  0%)  0 (  0%)
2006-11-09 06 0 (  0%)  0 (  0%)
2006-11-09 07 0 (  0%)  0 (  0%)
2006-11-09 08 0 (  0%)  0 (  0%)
2006-11-09 09 0 (  0%)  0 (  0%)
2006-11-09 10 0 (  0%)  0 (  0%)
2006-11-09 11 0 (  0%)  0 (  0%)


Done. Report generated in 27 sec by sa-stats.pl, version 6256.


Re: SA-STATS on BSD

2006-11-09 Thread Odhiambo WASHINGTON
* On 08/11/06 19:15 -0500, Jean-Paul Natola wrote:
| Hi everyone,
|  
| I've tried on apache and SARE  and bsd   sites to find the documentation on
| installing sa-stats , I have found the the actual sa-stats.pl  but I dont
| know how to go about installing it on BSD any guidance would be appreciated.
|  
| Freebsd 5.4
| exim
| sa 3.1.7

cd /usr/ports/mail/p5-Mail-SpamAssassin
make -DWITH_TOOLS reinstall

If you install the utility called portupgrade, you can do:

portupgrade -m WITH_TOOLS=1 p5-Mail-SpamAssassin

The sa-stats.pl will then be installed in
/usr/local/share/spamassassin/tools/

HTH



-Wash

http://www.netmeister.org/news/learn2quote.html

DISCLAIMER: See http://www.wananchi.com/bms/terms.php

--
+==+
|\  _,,,---,,_ | Odhiambo Washington<[EMAIL PROTECTED]>
Zzz /,`.-'`'-.  ;-;;,_ | Wananchi Online Ltd.   www.wananchi.com
   |,4-  ) )-,_. ,\ (  `'-'| Tel: +254 20 313985-9  +254 20 313922
  '---''(_/--'  `-'\_) | GSM: +254 722 743223   +254 733 744121
+==+

Zero Defects, n.:
The result of shutting down a production line.


SA-STATS on BSD

2006-11-08 Thread Jean-Paul Natola
Hi everyone,
 
I've tried on apache and SARE  and bsd   sites to find the documentation on
installing sa-stats , I have found the the actual sa-stats.pl  but I dont
know how to go about installing it on BSD any guidance would be appreciated.
 
Freebsd 5.4
exim
sa 3.1.7
 
 
 
 
 
 
 
 
Jean-Paul Natola
Network Administrator
Information Technology
Family Care International
588 Broadway Suite 503
New York, NY 10012
Phone:212-941-5300 xt 36
Fax:  212-941-5563
Mailto: [EMAIL PROTECTED]


Re: [OT] Stats up drastically from a year ago.

2006-10-25 Thread Richard Frovarp

Chris Santerre wrote:


Just for giggles! Keeping exact numbers out of it, here are the stats 
compared to a year ago:


RBL blocks up 3 fold!
Spam caught by SA doubled.
Legit email traffic also doubled.

Whe, what a year!

Thanks,

Chris Santerre
SysAdmin and Spamfighter
www.rulesemporium.com
www.uribl.com



Looking at similar numbers for us, we are blocking three times as many 
messages as over a year ago. However, out legit traffic is slightly 
down. Kind of sucks to add hardware and know it isn't due to legit traffic.


Richard


[OT] Stats up drastically from a year ago.

2006-10-25 Thread Chris Santerre
Title: [OT] Stats up drastically from a year ago.





Just for giggles! Keeping exact numbers out of it, here are the stats compared to a year ago:


RBL blocks up 3 fold!
Spam caught by SA doubled.
Legit email traffic also doubled. 


Whe, what a year! 


Thanks,


Chris Santerre
SysAdmin and Spamfighter
www.rulesemporium.com
www.uribl.com







RE: Stats of rules ?

2006-09-27 Thread Bowie Bailey
Chris wrote:
> On Tuesday 26 September 2006 2:50 pm, Bowie Bailey wrote:
> > Noc Phibee wrote:
> > > Hi
> > > 
> > > on my spamassassin server, i use a lot of rules ..
> > > personnal and downloaded.
> > > 
> > > Anyone know if they have a tools for know in 24h or 48h
> > > if a rules are used or not ?
> > 
> > If you just want to know if the rule is getting hits, you can do a
> > simple grep against your maillog file.
> > 
> > For more in-depth stats, try this script:
> > 
> > http://www.rulesemporium.com/programs/sa-stats.txt
> > 
> > Rename it to sa-stats.pl before you run it.
> 
> Your script is still running great over here, if he's looking for
> something different than what sa-stats.pl provies and if your script
> is for public consumption, you may want to suggest it to him. I've
> also got it running daily in a cronjob is he wants something like
> that. 

My script can be used as well, although it is more for add-on rules in
particular.  It does not give stats on any of the built-in rules.

I'm attaching an updated version.  I have fixed the rulename detection
so that it will pick up on the fuzzyocr rules now (it will list their
score as 0 since they don't have a score line associated).

-- 
Bowie



sa-addon-stats.pl
Description: Binary data


Re: Stats of rules ?

2006-09-26 Thread John D. Hardin
On Tue, 26 Sep 2006, Noc Phibee wrote:

> Anyone know if they have a tools for know in 24h or 48h if a rules
> are used or not ?

Depending on how your SA is set up, you may be able to see the rules
that are hit in /var/log/maillog

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 It may be possible to start a programme of weapon registration as a
 first step towards the physical collection phase. ... Assurances
 must be provided, and met, that the process of registration will
 not lead to immediate weapons seizures by security forces.
  -- the UN, who "doesn't want to confiscate guns"
---



RE: Stats of rules ?

2006-09-26 Thread Bowie Bailey
Noc Phibee wrote:
> Hi
> 
> on my spamassassin server, i use a lot of rules ..
> personnal and downloaded.
> 
> Anyone know if they have a tools for know in 24h or 48h
> if a rules are used or not ?

If you just want to know if the rule is getting hits, you can do a
simple grep against your maillog file.

For more in-depth stats, try this script:

http://www.rulesemporium.com/programs/sa-stats.txt

Rename it to sa-stats.pl before you run it.

-- 
Bowie


Stats of rules ?

2006-09-26 Thread Noc Phibee

Hi

on my spamassassin server, i use a lot of rules ..
personnal and downloaded.

Anyone know if they have a tools for know in 24h or 48h
if a rules are used or not ?

thanks bye






stats on SPAM filtering efficiency

2006-09-06 Thread John Goubeaux

Folks,

Does anyone know of a reliable source of info that reports on SPAM 
filtering efficiency as well as numbers of  systems that actually 
still do NOT employ any form of spam filtering at all?  Not too long 
a go I happened upon some Gartner Group data that indicated that some 
60% of mail systems still did not employ spam filtering capabilities 
of any sort !


I realize I can go searching for this info and I am doing so but i 
also wanted to see if those of us who actually spend time (and based 
on the traffic on this list a LOT of time) had happened upon data 
that helped shed some light on the situation that might have been 
helpful  to educate and inform their management and or supervisors 
as to what they are up against as well as a bench mark to compare 
what they are actually successfully doing right now.


Any ideas comments are appreciated!

Thanks  -john
--
John Goubeaux
Systems Administrator
Gevirtz Graduate School of Education
UC Santa Barbara
Phelps Hall 3534
805 893-8190


Re: [Devel-spam] Hash Stats

2006-08-30 Thread decoder
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

- --[ UxBoD ]-- wrote:
> How many hits are you getting ?
>
> Database changed mysql> select count(*) from maillog where
> spamreport like '%FUZZY_OCR%' and date = '2006-08-29'; +--+
>  | count(*) | +--+ |  385 | +--+ 1 row in set
> (0.10 sec)
>
> mysql> select count(*) from maillog where spamreport like
> '%FUZZY_OCR_KNOWN_HASH%' and date = '2006-08-29'; +--+ |
> count(*) | +--+ |1 | +--+ 1 row in set
> (0.05 sec)
>
> mysql> select count(*) from maillog where spamreport like
> '%FUZZY_OCR_CORRUPT%' and date = '2006-08-29'; +--+ |
> count(*) | +--+ |  298 | +--+ 1 row in set
> (0.05 sec)
>
> --[ UxBoD ]-- // PGP Key: "curl -s
> http://www.splatnix.net/uxbod.asc | gpg --import" // Fingerprint:
> 543A E778 7F2D 98F1 3E50  9C1F F190 93E0 E8E8 0CF8 // Keyserver:
> www.keyserver.net Key-ID: 0xE8E80CF8
>
>

Did you apply the patch I sent to the SA mailing list? There is a bug
in 2.3b which breaks the database completely. Please fix the
corresponding line:

line 492:


It says:

  print DB "$score::$digest\n";


Should be:

  print DB "${score}::${digest}\n";



As a result, the produced hashdb is corrupted, delete it and start
with a new one...


Chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE9XUrJQIKXnJyDxURAoWOAJ9ej8U66qKCGiJSrPYM51ZP0WHGnQCfZWqa
8BxDIenQxw0JrGD/31hQshI=
=lDtr
-END PGP SIGNATURE-



Re: SA logging options wrong uid Debian-exim sa-stats

2006-08-22 Thread Magnus Holmgren
On Monday 21 August 2006 22:21, Stefan Bauer took the opportunity to say:
> iam using Debian with Spamassasin 3.1.1-1 and exim 4.62.
>
> Iam looking forward to use sa-stats[1] with the stats from spamassasin
> from /var/log/exim4/mainlog.log like:
>
> Aug 21 17:58:51 main spamd[4064]: spamd: result: . -1 - AWL,BAYES_00
> scantime=2.3,size=5146,user=Debian-exim,uid=104,required_score=3.0,rhost=lo
>calhost.
> localdomain,raddr=127.0.0.1,rport=49475,mid=<[EMAIL PROTECTED]
>.de>,rmid=
> <[EMAIL PROTECTED]>,bayes=1.11668452262847e-11,autolearn=no
>
> this works but not very well. Spamassasin logs to the file above but
> the user=Debian-exim part is always Debian-exim. How can i setup
> Spamamsassin to log the files or deliver the files under the uid of
> the user who received the mails?

This is an Exim question, which you should ask exim-users@exim.org or 
[EMAIL PROTECTED] about.

> Running sa-stats only let me get stats[2] for the user Debian-exim
> which lists all mails.
>
> So my question is how can i negotiate SA to deliver the mails under
> the UID of the users to get usable logs?

It depends on how you call SpamAssassin from Exim, which in turn partly 
depends on whether you want personal user preferences or not. With sa-exim 
you can't. With the exiscan ACL condition (spam = ) you can, but you 
have to make special arrangements to unambiguously decide which user to scan 
for if there are many recipients. If you call SA late in the delivery 
process, for instance as a transport filter, once for each recipient, then 
it's easy.

So please come to the Exim mailing lists and describe your setup in more 
detail.

> [1] http://david.hexstream.co.uk/scripts/sa-stats/sa-stats.pl.html
> [2] http://www.plzk.de/stats/spam

-- 
Magnus Holmgren[EMAIL PROTECTED]
   (No Cc of list mail needed, thanks)


pgp6G55dVZIMj.pgp
Description: PGP signature


SA logging options wrong uid Debian-exim sa-stats

2006-08-21 Thread Stefan Bauer

Hello List,

iam using Debian with Spamassasin 3.1.1-1 and exim 4.62.

Iam looking forward to use sa-stats[1] with the stats from spamassasin 
from /var/log/exim4/mainlog.log like:


Aug 21 17:58:51 main spamd[4064]: spamd: result: . -1 - AWL,BAYES_00
scantime=2.3,size=5146,user=Debian-exim,uid=104,required_score=3.0,rhost=localhost.
localdomain,raddr=127.0.0.1,rport=49475,mid=<[EMAIL PROTECTED]>,rmid=
<[EMAIL PROTECTED]>,bayes=1.11668452262847e-11,autolearn=no

this works but not very well. Spamassasin logs to the file above but 
the user=Debian-exim part is always Debian-exim. How can i setup 
Spamamsassin to log the files or deliver the files under the uid of 
the user who received the mails?


Running sa-stats only let me get stats[2] for the user Debian-exim 
which lists all mails.


So my question is how can i negotiate SA to deliver the mails under 
the UID of the users to get usable logs?


[1] http://david.hexstream.co.uk/scripts/sa-stats/sa-stats.pl.html
[2] http://www.plzk.de/stats/spam

--
thanks in advance

Stefan Bauer

-->
www.plzk.de - www.plzk.com
---<


Some Ebay stats

2006-08-08 Thread qqqq
This is interesting.

This is a list of relays with the From field matching '@ebay.'

202.64.65.129.in-addr.arpa domain name pointer gabriel.its.calpoly.edu.
204.64.65.129.in-addr.arpa domain name pointer 
email-gateway-michael.its.calpoly.edu.
10.193.98.140.in-addr.arpa domain name pointer ruebert.ieee.org.
23.193.98.140.in-addr.arpa domain name pointer engine.ieee.org.
55.1.41.198.in-addr.arpa domain name pointer mx01.nic.name.
56.1.41.198.in-addr.arpa domain name pointer mx02.nic.name.
34.3.41.198.in-addr.arpa domain name pointer mx04.nic.name.
35.3.41.198.in-addr.arpa domain name pointer mx05.nic.name.
199.132.22.203.in-addr.arpa domain name pointer wm-06.dcsi.net.au.
51.11.13.204.in-addr.arpa domain name pointer bellerophon.decipherinc.com.
173.52.190.206.in-addr.arpa domain name pointer smtp104.biz.mail.re2.yahoo.com.
45.66.191.209.in-addr.arpa domain name pointer 
mailforward101.store.mud.yahoo.com.
47.66.191.209.in-addr.arpa domain name pointer 
mailforward103.store.mud.yahoo.com.
48.66.191.209.in-addr.arpa domain name pointer 
mailforward104.store.mud.yahoo.com.
21.98.109.210.in-addr.arpa is an alias for 21.0-255.98.109.210.in-addr.arpa.
21.0-255.98.109.210.in-addr.arpa domain name pointer mail-kr1.bigfoot.com.
228.216.115.211.in-addr.arpa is an alias for 228.0-255.216.115.211.in-addr.arpa.
228.0-255.216.115.211.in-addr.arpa domain name pointer mail-kr.bigfoot.com.
150.180.144.216.in-addr.arpa is an alias for 216.144.180.150.rev.k12system.com.
216.144.180.150.rev.k12system.com domain name pointer boboshrimps.dok.org.
51.145.200.216.in-addr.arpa domain name pointer sitemail.everyone.net.
105.160.220.216.in-addr.arpa domain name pointer dickory.paonline.com.
84.244.33.216.in-addr.arpa domain name pointer sem.ebay.com.
123.161.121.59.in-addr.arpa domain name pointer 
59-121-161-123.dynamic.hinet.net.
106.255.123.61.in-addr.arpa domain name pointer 61123255106.cidr.odn.ne.jp.
3.186.209.63.in-addr.arpa domain name pointer mx1.iviewer.com.
163.171.251.63.in-addr.arpa domain name pointer m1.dnsix.com.
19.14.80.63.in-addr.arpa domain name pointer mailhost.liveworld.com.
202.166.71.64.in-addr.arpa is an alias for 202.subnet192.166.71.64.in-addr.arpa.
202.subnet192.166.71.64.in-addr.arpa domain name pointer spf5.us4.outblaze.com.
138.45.48.65.in-addr.arpa domain name pointer 
reverse.138.45.48.65.static.ldmi.com.
5.21.134.66.in-addr.arpa domain name pointer h-66-134-21-5.hstqtx02.covad.net.
180.195.135.66.in-addr.arpa domain name pointer data.ebay.com.
11.197.135.66.in-addr.arpa domain name pointer mxpool05.ebay.com.
12.197.135.66.in-addr.arpa domain name pointer mxpool06.ebay.com.
13.197.135.66.in-addr.arpa domain name pointer mxpool07.ebay.com.
14.197.135.66.in-addr.arpa domain name pointer mxpool08.ebay.com.
15.197.135.66.in-addr.arpa domain name pointer mxpool09.ebay.com.
16.197.135.66.in-addr.arpa domain name pointer mxpool10.ebay.com.
17.197.135.66.in-addr.arpa domain name pointer mxpool11.ebay.com.
18.197.135.66.in-addr.arpa domain name pointer mxpool12.ebay.com.
19.197.135.66.in-addr.arpa domain name pointer mxpool13.ebay.com.
20.197.135.66.in-addr.arpa domain name pointer mxpool14.ebay.com.
21.197.135.66.in-addr.arpa domain name pointer mxpool15.ebay.com.
22.197.135.66.in-addr.arpa domain name pointer mxpool16.ebay.com.
23.197.135.66.in-addr.arpa domain name pointer mxpool17.ebay.com.
24.197.135.66.in-addr.arpa domain name pointer mxpool18.ebay.com.
25.197.135.66.in-addr.arpa domain name pointer mxpool19.ebay.com.
26.197.135.66.in-addr.arpa domain name pointer mxpool20.ebay.com.
27.197.135.66.in-addr.arpa domain name pointer mxpool21.ebay.com.
28.197.135.66.in-addr.arpa domain name pointer mxpool22.ebay.com.
29.197.135.66.in-addr.arpa domain name pointer mxpool23.ebay.com.
8.197.135.66.in-addr.arpa domain name pointer mxpool02.ebay.com.
198.209.135.66.in-addr.arpa domain name pointer mxsmfpool01.ebay.com.
199.209.135.66.in-addr.arpa domain name pointer mxsmfpool02.ebay.com.
200.209.135.66.in-addr.arpa domain name pointer mxsmfpool03.ebay.com.
201.209.135.66.in-addr.arpa domain name pointer mxsmfpool04.ebay.com.
202.209.135.66.in-addr.arpa domain name pointer mxsmfpool05.ebay.com.
203.209.135.66.in-addr.arpa domain name pointer mxsmfpool06.ebay.com.
204.209.135.66.in-addr.arpa domain name pointer mxsmfpool07.ebay.com.
205.209.135.66.in-addr.arpa domain name pointer mxsmfpool08.ebay.com.
206.209.135.66.in-addr.arpa domain name pointer mxsmfpool09.ebay.com.
207.209.135.66.in-addr.arpa domain name pointer mxsmfpool10.ebay.com.
208.209.135.66.in-addr.arpa domain name pointer mxsmfpool11.ebay.com.
209.209.135.66.in-addr.arpa domain name pointer mxsmfpool12.ebay.com.
210.209.135.66.in-addr.arpa domain name pointer mxsmfpool13.ebay.com.
211.209.135.66.in-addr.arpa domain name pointer mxsmfpool14.ebay.com.
212.209.135.66.in-addr.arpa domain name pointer mxsmfpool15.ebay.com.
213.209.135.66.in-addr.arpa domain name pointer mxsmfpool16.ebay.com.
214.209.135.66.in-addr.arpa domain name pointer mxsmfpool17.ebay

Re: Spam success stats

2006-07-05 Thread Rick Macdougall

Joe Zitnik wrote:

Does anyone have a source for statistics on spam victims, ie. the number
of people who actually click on the "Remove Me" line, or who "update
their banking information", or who actually buy those pencil enlargement
pills? 



Not as such but there was one client who hadn't payed his bills so no 
updates were done on his system, which was then compromised and had a 
fake banking site installed on it.


I noticed it pretty quickly but during the time it was up (about 2 
hours) there were 12 people who obviously gotten a bank spam/scam and 
had entered in their private PIN and bank account information.


We contacted the bank the next day and they took care of those clients 
but I was still amazed to see 12 people enter their private information 
in 2 hours.


Regards,

Rick



Spam success stats

2006-07-05 Thread Joe Zitnik

Does anyone have a source for statistics on spam victims, ie. the number of people who actually click on the "Remove Me" line, or who "update their banking information", or who actually buy those pencil enlargement pills? 

RE: Latest sa-stats from last week

2006-05-10 Thread Bowie Bailey
jdow wrote:
> From: "Bowie Bailey" <[EMAIL PROTECTED]>
> > jdow wrote:
> > > 
> > > Importune on them to feed you as large a collection of ham and
> > > spam as they can, once. Then turn on autolearn, cross your
> > > fingers, and put on your flack jacket.
> > 
> > What flack jacket?  I have Bayes turned on now and I never did any
> > manual training on most of the accounts.  I just turned it on and
> > let autolearn (with the default settings) do it's thing.  So far, I
> > have received very few complaints. 
> > 
> > But then again, I think less than half of my users are even taking
> > advantage of the spam markup.  Since I don't do any blocking or
> > sorting on the server, it is up to them to use MUA rules to sort or
> > delete the spam once my server has marked it.
> 
> Fairly frequently I see evidence that autolearn can massively misfire
> on SpamAssassin startup. It does not always happen or there'd be a lot
> more messages about it. But there is apparently a vulnerable period
> that can go bad with just the wrong selection of messages. Once the
> database is large inertia will save the day.

Right.  I understand the danger of doing things this way.  I was just
pointing out that my users don't generally complain about spam.  I
assume SpamAssassin is doing well for them, but since they never tell
me anything, I really have no idea.

-- 
Bowie


Re: Latest sa-stats from last week

2006-05-10 Thread jdow

From: "Bowie Bailey" <[EMAIL PROTECTED]>


jdow wrote:

From: "Bowie Bailey" <[EMAIL PROTECTED]>

> Michael Monnerie wrote:
> > On Dienstag, 9. Mai 2006 16:18 Bowie Bailey wrote:
> > > I've got per-user Bayes and most of my users
> > > don't bother to train it.
> > 
> > Another reason for site-wide bayes, I'd say.
> 
> I've considered that, but it won't work in our setup.  This box

> scans our internal email as well as all of our customer's email.
> Since we are in an entirely different line of business from our
> customers, what we consider to be ham and spam will be quite
> different from theirs. If I could train it on both sets, it might
> work, but I don't have access to any of their emails for training.
> 
> Also, I really prefer a per-user bayes for our internal email

> since there are various accounts that get a specific type of ham
> and work very well with Bayes.

Importune on them to feed you as large a collection of ham and spam
as they can, once. Then turn on autolearn, cross your fingers, and
put on your flack jacket.


What flack jacket?  I have Bayes turned on now and I never did any
manual training on most of the accounts.  I just turned it on and let
autolearn (with the default settings) do it's thing.  So far, I have
received very few complaints.

But then again, I think less than half of my users are even taking
advantage of the spam markup.  Since I don't do any blocking or
sorting on the server, it is up to them to use MUA rules to sort or
delete the spam once my server has marked it.


Fairly frequently I see evidence that autolearn can massively misfire
on SpamAssassin startup. It does not always happen or there'd be a lot
more messages about it. But there is apparently a vulnerable period
that can go bad with just the wrong selection of messages. Once the
database is large inertia will save the day.

{^_^}


Re: Latest sa-stats from last week

2006-05-10 Thread Jay Lee




Bowie Bailey wrote:

  Michael Monnerie wrote:
  
  
On Mittwoch, 10. Mai 2006 17:27 Bowie Bailey wrote:


  So you are saying that I should not feed Bayes with the unsolicited
marketing garbage that I get because it looks like something that
could have been requested?
  

If it's a newsletter from a seemingly legit company I don't feed it to
bayes. I try to unsubscribe from them. If they still send me, I write
some rule to filter them. If some customer then rants, I tell them
that said company doesn't work nicely - and he should make a filter
to get e-mail from that company out of the SPAM folder again.

  
  
If it comes to an account that does not subscribe to newsletters
(webmaster, sales, etc), it is spam by definition and is fed to Bayes.

  
  

  
Remember: 10 good SPAM and HAM are better than 200 where 5% are
wrong.

  
  Wrong for who?  If it looks like marketing, 99% of the time, I don't
want it.  And for most of the accounts that I deal with, this goes
up to 100%.  Not true for my customers, tho.
  

Yes, some manual filters can catch those. If it's stupid SPAM, then
bayes.



  My philosophy with Bayes has always been to skip the ham/spam
definitions and go with a wanted/unwanted model.  This way Bayes
learns to filter out the emails you don't want even if some of them
may technically be ham.  (Obviously, I would not be able to do this
on a site-wide installation)
  

But as you said your bayes is not quite accurate, so it seems not to
work really. Wouldn't it be better to have a highly accurate bayes,
and setup some filters for you personally? If a BAYES_99 would be
always SPAM for you, you could give it 4.5 or 5 points, and probably
filter more SPAM than now?

  
  
If I look at my personal database, the spam percentage shown in the
stats is lower than I'd like, but I wouldn't say it's not accurate.  I
very rarely see a true false positive or negative with Bayes and I
watch my account closely.  I do see a few ham with BAYES_99 and spam
with BAYES_00, but that's usually simply because those were either
spam that only hit BAYES_99 or ham (usually from this list) that
tripped a few extra rules.

  
  

  But then again, I think less than half of my users are even taking
advantage of the spam markup.  Since I don't do any blocking or
sorting on the server, it is up to them to use MUA rules to sort or
delete the spam once my server has marked it.
  

I do the same, just wrote a nice document for Outlook 2003 describing
how to filter SPAM.

  
  
I've done the same for both Outlook Express and Thunderbird.  The
Thunderbird setup is a single checkbox. :)

  

It would be nice if updates.spamassassin.org wasn't using mirrors on
non-standard ports, sa-update is trying to use
http://buildbot.spamassassin.org.nyud.net:8090/updatestage/ which means
I'd have to open a port on my firewall just to get updates, sigh...

Jay




RE: Latest sa-stats from last week

2006-05-10 Thread Bowie Bailey
Michael Monnerie wrote:
> On Mittwoch, 10. Mai 2006 17:27 Bowie Bailey wrote:
> > So you are saying that I should not feed Bayes with the unsolicited
> > marketing garbage that I get because it looks like something that
> > could have been requested?
> 
> If it's a newsletter from a seemingly legit company I don't feed it to
> bayes. I try to unsubscribe from them. If they still send me, I write
> some rule to filter them. If some customer then rants, I tell them
> that said company doesn't work nicely - and he should make a filter
> to get e-mail from that company out of the SPAM folder again.

If it comes to an account that does not subscribe to newsletters
(webmaster, sales, etc), it is spam by definition and is fed to Bayes.

> > > Remember: 10 good SPAM and HAM are better than 200 where 5% are
> > > wrong.
> > Wrong for who?  If it looks like marketing, 99% of the time, I don't
> > want it.  And for most of the accounts that I deal with, this goes
> > up to 100%.  Not true for my customers, tho.
> 
> Yes, some manual filters can catch those. If it's stupid SPAM, then
> bayes.
> 
> > My philosophy with Bayes has always been to skip the ham/spam
> > definitions and go with a wanted/unwanted model.  This way Bayes
> > learns to filter out the emails you don't want even if some of them
> > may technically be ham.  (Obviously, I would not be able to do this
> > on a site-wide installation)
> 
> But as you said your bayes is not quite accurate, so it seems not to
> work really. Wouldn't it be better to have a highly accurate bayes,
> and setup some filters for you personally? If a BAYES_99 would be
> always SPAM for you, you could give it 4.5 or 5 points, and probably
> filter more SPAM than now?

If I look at my personal database, the spam percentage shown in the
stats is lower than I'd like, but I wouldn't say it's not accurate.  I
very rarely see a true false positive or negative with Bayes and I
watch my account closely.  I do see a few ham with BAYES_99 and spam
with BAYES_00, but that's usually simply because those were either
spam that only hit BAYES_99 or ham (usually from this list) that
tripped a few extra rules.

> > But then again, I think less than half of my users are even taking
> > advantage of the spam markup.  Since I don't do any blocking or
> > sorting on the server, it is up to them to use MUA rules to sort or
> > delete the spam once my server has marked it.
> 
> I do the same, just wrote a nice document for Outlook 2003 describing
> how to filter SPAM.

I've done the same for both Outlook Express and Thunderbird.  The
Thunderbird setup is a single checkbox. :)

-- 
Bowie


Re: Latest sa-stats from last week

2006-05-10 Thread Michael Monnerie
On Mittwoch, 10. Mai 2006 17:27 Bowie Bailey wrote:
> So you are saying that I should not feed Bayes with the unsolicited
> marketing garbage that I get because it looks like something that
> could have been requested?

If it's a newsletter from a seemingly legit company I don't feed it to 
bayes. I try to unsubscribe from them. If they still send me, I write 
some rule to filter them. If some customer then rants, I tell them that 
said company doesn't work nicely - and he should make a filter to get 
e-mail from that company out of the SPAM folder again.

> > Remember: 10 good SPAM and HAM are better than 200 where 5% are
> > wrong.
> Wrong for who?  If it looks like marketing, 99% of the time, I don't
> want it.  And for most of the accounts that I deal with, this goes up
> to 100%.  Not true for my customers, tho.

Yes, some manual filters can catch those. If it's stupid SPAM, then 
bayes.

> My philosophy with Bayes has always been to skip the ham/spam
> definitions and go with a wanted/unwanted model.  This way Bayes
> learns to filter out the emails you don't want even if some of them
> may technically be ham.  (Obviously, I would not be able to do this
> on a site-wide installation)

But as you said your bayes is not quite accurate, so it seems not to 
work really. Wouldn't it be better to have a highly accurate bayes, and 
setup some filters for you personally? If a BAYES_99 would be always 
SPAM for you, you could give it 4.5 or 5 points, and probably filter 
more SPAM than now?

> But then again, I think less than half of my users are even taking
> advantage of the spam markup.  Since I don't do any blocking or
> sorting on the server, it is up to them to use MUA rules to sort or
> delete the spam once my server has marked it.

I do the same, just wrote a nice document for Outlook 2003 describing 
how to filter SPAM.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   "lynx -source http://zmi.at/zmi3.asc | gpg --import"
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgpzgGFS0Slt9.pgp
Description: PGP signature


RE: Latest sa-stats from last week

2006-05-10 Thread Bowie Bailey
jdow wrote:
> From: "Bowie Bailey" <[EMAIL PROTECTED]>
> 
> > Michael Monnerie wrote:
> > > On Dienstag, 9. Mai 2006 16:18 Bowie Bailey wrote:
> > > > I've got per-user Bayes and most of my users
> > > > don't bother to train it.
> > > 
> > > Another reason for site-wide bayes, I'd say.
> > 
> > I've considered that, but it won't work in our setup.  This box
> > scans our internal email as well as all of our customer's email.
> > Since we are in an entirely different line of business from our
> > customers, what we consider to be ham and spam will be quite
> > different from theirs. If I could train it on both sets, it might
> > work, but I don't have access to any of their emails for training.
> > 
> > Also, I really prefer a per-user bayes for our internal email
> > since there are various accounts that get a specific type of ham
> > and work very well with Bayes.
> 
> Importune on them to feed you as large a collection of ham and spam
> as they can, once. Then turn on autolearn, cross your fingers, and
> put on your flack jacket.

What flack jacket?  I have Bayes turned on now and I never did any
manual training on most of the accounts.  I just turned it on and let
autolearn (with the default settings) do it's thing.  So far, I have
received very few complaints.

But then again, I think less than half of my users are even taking
advantage of the spam markup.  Since I don't do any blocking or
sorting on the server, it is up to them to use MUA rules to sort or
delete the spam once my server has marked it.

-- 
Bowie


RE: Latest sa-stats from last week

2006-05-10 Thread Bowie Bailey
Michael Monnerie wrote:
> On Dienstag, 9. Mai 2006 23:01 Bowie Bailey wrote:
> > Hmm... If you are training Bayes, and all of your ham is in English,
> > then what does Bayes do with the Chinese ham your customers get?
> 
> Nothing. But you won't get a SPAM report from bayes if the e-mail is
> chinese and you never feed chinese language e-mail. So no FPs.

I guess that would work if you simply don't feed Bayes with any
foreign language material at all.

> > True, spam is spam.  It's the vast differences in ham that I am more
> > worried about.  Our customers are salesmen for the most part, so
> > they are constantly sending and receiving marketing type emails.
> > For us, marketing stuff is almost always considered spam.  I think
> > this would cause a problem with false positives for our customers
> > if I train Bayes based on our idea of ham and spam.
> 
> The important thing is that you should *never* feed to bayes something
> that *could* be a legit e-mail. Most people seem to make that error. I
> do NOT feed SPAM nor HAM that could be a legit mail.

So you are saying that I should not feed Bayes with the unsolicited
marketing garbage that I get because it looks like something that
could have been requested?

> Just those nigerian who want to give you some million $ because you
> are so nice, or those lotteries where you won a lot but before you
> have to pay, the very good jobs a lot of people seem to offer where
> you can earn 5000$ for only 3 hours of work and so on.
> 
> No chance this could be HAM for anybody (with at least some brain, but
> anyway you have to protect such people from themselves *g*). The same
> for feeding HAM: Give it only food that *is legit e-mail*, not some
> which could be.
> 
> Remember: 10 good SPAM and HAM are better than 200 where 5% are wrong.

Wrong for who?  If it looks like marketing, 99% of the time, I don't
want it.  And for most of the accounts that I deal with, this goes up
to 100%.  Not true for my customers, tho.

My philosophy with Bayes has always been to skip the ham/spam
definitions and go with a wanted/unwanted model.  This way Bayes
learns to filter out the emails you don't want even if some of them
may technically be ham.  (Obviously, I would not be able to do this on
a site-wide installation)

> Another good thing: Since I help with mass-checks, I found that of my
> 6000 SPAMs, I had about 4 or 5 which I had to delete (but unlearn
> before), as they were mistakes. That's the advantage you get back when
> running mass-checks.

-- 
Bowie


Re: Latest sa-stats from last week

2006-05-09 Thread jdow

From: "Bowie Bailey" <[EMAIL PROTECTED]>


Michael Monnerie wrote:

On Dienstag, 9. Mai 2006 16:18 Bowie Bailey wrote:
> I've got per-user Bayes and most of my users
> don't bother to train it.

Another reason for site-wide bayes, I'd say.


I've considered that, but it won't work in our setup.  This box scans
our internal email as well as all of our customer's email.  Since we
are in an entirely different line of business from our customers, what
we consider to be ham and spam will be quite different from theirs.
If I could train it on both sets, it might work, but I don't have
access to any of their emails for training.

Also, I really prefer a per-user bayes for our internal email since
there are various accounts that get a specific type of ham and work
very well with Bayes.


Importune on them to feed you as large a collection of ham and spam
as they can, once. Then turn on autolearn, cross your fingers, and
put on your flack jacket.

{O.O}


Re: Latest sa-stats from last week

2006-05-09 Thread jdow

From: "Bowie Bailey" <[EMAIL PROTECTED]>


jdow wrote:

From: "Bowie Bailey" <[EMAIL PROTECTED]>

>  wrote:
> > > > TOP SPAM RULES FIRED
> > > > 
> > > > RANKRULE NAME   COUNT %OFRULES
> > > >%OFMAIL %OFSPAM  %OFHAM
> > > > 
> > > > 1 URIBL_BLACK 1633977.09 
> > > > 29.11   78.050.50
> > > 
> > > Nice.
> > > 
> > > How does that Queen song go??  We... are...  ;)
> > 
> > LOL!  Congrats!
> 
> I'll second that!  I think the network tests are taking over...
> 
> TOP SPAM RULES FIRED

> 
> RANKRULE NAMECOUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
> 
>  6  BAYES_99 26754 4.19   44.49   67.00  3.06

Holy spoo! Bayes can do MUCH better than that!
{O.O}


I'm sure it can, but I've got per-user Bayes and most of my users
don't bother to train it.


That brings to mind an interesting question. Could SpamAssassin (ever)
be configured to accept a global Bayes with per user Bayes for er
"seasoning"? Could such a setup be effective?

{^_^}


Re: Latest sa-stats from last week

2006-05-09 Thread Michael Monnerie
On Dienstag, 9. Mai 2006 23:01 Bowie Bailey wrote:
> Hmm... If you are training Bayes, and all of your ham is in English,
> then what does Bayes do with the Chinese ham your customers get?

Nothing. But you won't get a SPAM report from bayes if the e-mail is 
chinese and you never feed chinese language e-mail. So no FPs.

> True, spam is spam.  It's the vast differences in ham that I am more
> worried about.  Our customers are salesmen for the most part, so they
> are constantly sending and receiving marketing type emails.  For us,
> marketing stuff is almost always considered spam.  I think this would
> cause a problem with false positives for our customers if I train
> Bayes based on our idea of ham and spam.

The important thing is that you should *never* feed to bayes something 
that *could* be a legit e-mail. Most people seem to make that error. I 
do NOT feed SPAM nor HAM that could be a legit mail.

Just those nigerian who want to give you some million $ because you are 
so nice, or those lotteries where you won a lot but before you have to 
pay, the very good jobs a lot of people seem to offer where you can 
earn 5000$ for only 3 hours of work and so on.

No chance this could be HAM for anybody (with at least some brain, but 
anyway you have to protect such people from themselves *g*). The same 
for feeding HAM: Give it only food that *is legit e-mail*, not some 
which could be.

Remember: 10 good SPAM and HAM are better than 200 where 5% are wrong.

Another good thing: Since I help with mass-checks, I found that of my 
6000 SPAMs, I had about 4 or 5 which I had to delete (but unlearn 
before), as they were mistakes. That's the advantage you get back when 
running mass-checks.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   "lynx -source http://zmi.at/zmi3.asc | gpg --import"
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgp7wTVFG6Tpn.pgp
Description: PGP signature


RE: Latest sa-stats from last week

2006-05-09 Thread Bowie Bailey
Michael Monnerie wrote:
> On Dienstag, 9. Mai 2006 17:14 Bowie Bailey wrote:
> > I've considered that, but it won't work in our setup.  This box
> > scans our internal email as well as all of our customer's email.
> > Since we are in an entirely different line of business from our
> > customers, what we consider to be ham and spam will be quite
> > different from theirs. If I could train it on both sets, it might
> > work, but I don't have access to any of their emails for training.
> 
> I believe that's a general mistake. I've got a server with many diff.
> domains, some people working with china, others with brazil, many
> different languages, and so on. With site wide bayes which is only
> trained _by me_, I've not had a single complaint in years where bayes
> was incorrect.

Hmm... If you are training Bayes, and all of your ham is in English,
then what does Bayes do with the Chinese ham your customers get?

> Real SPAM is really SPAM. For everybody. Those penis enlargements,
> viagra and drug ads, and false job offers are really ever SPAM. And if
> somebody wants to get those info about penis enlargement, he should
> just look in his SPAM folder, it's not getting deleted anyway.

True, spam is spam.  It's the vast differences in ham that I am more
worried about.  Our customers are salesmen for the most part, so they
are constantly sending and receiving marketing type emails.  For us,
marketing stuff is almost always considered spam.  I think this would
cause a problem with false positives for our customers if I train
Bayes based on our idea of ham and spam.

> If you are sane and try to not make mistakes with bayes, it works
> phantastic. I've got about 6.000 spam & ham, and everyday I feed the
> new SPAM to bayes for learning.
> 
> Try it: keep some real SPAM, use site-wide bayes without auto-learn.
> Feed at least 200 spam & ham to bayes, and train it every day. You
> will be happy.

I might give it a try.  But, then again, based on some testing I just
did, I might leave it the way it is.  I'll include that info in a
separate thread.

-- 
Bowie


Re[2]: Latest sa-stats from last week

2006-05-09 Thread Fred T
Hello Rick,

Monday, May 8, 2006, 4:07:53 PM, you wrote:

> Interesting, my Razor stats show a MUCH higher false positive rate, so
> much so that I had to lower the scores dramatically.

> Spam Ham
> 1 RAZOR2_CHECK  9744 6.79   33.40   82.848.18
> 2 RAZOR2_CF_RANGE_51_1009303 6.48   31.89   79.097.37
> 6 RAZOR2_CF_RANGE_E8_51_100 5597 3.90   19.18   47.590.52
> 8 RAZOR2_CF_RANGE_E4_51_100 5111 3.56   17.52   43.456.86

Ahh but I think everyone might be missing a minor point and that's the
design of this script.  These FPs on HAM rules are just a best guess,
say a spam message only scores 3.0 and is not considered spam, any of
the rules that hit on that message are now going to be part of your
"ham" classification for SA-Stats.  I noticed this when installing
this script on my server.  So just cause it says it hit 8.18% of ham,
doesn't really mean those hits were really on ham, only what SA
thought was HAM...  hth

-- 
Best regards,
 Fredmailto:[EMAIL PROTECTED]



Re: Latest sa-stats from last week

2006-05-09 Thread Andy Jezierski

"" <[EMAIL PROTECTED]> wrote on
05/09/2006 10:27:27 AM:

> | > Holy spoo! Bayes can do MUCH better than that!
> | > {O.O}
> |
> | I'm sure it can, but I've got per-user Bayes and most of my users
> | don't bother to train it.
> |
> 
> I'm in a similar situation as Bowie.  I had to turn of Bayes
as mail
> that was obviously spam was
> getting a Bayes_0 pulling the # back down under the threshold.

I've got a sitewide Bayes and have had to lower Bayes_99
way down.  I just can't seem to get it trained properly to save my
soul.  Under SA 2.6x, Bayes ROCKED.  Just can't seem to get it
under control on 3.x.  Already started from scratch a couple of times.


SPAM

RANK    RULE NAME        
              COUNT %OFRULES %OFMAIL
%OFSPAM  %OFHAM

   2    BAYES_99    
                    7598
    5.93   13.90   64.07   14.77
  23    BAYES_50      
                  1718  
  1.34    3.14   14.49   36.42
  28    BAYES_80      
                   857
    0.67    1.57    7.23    3.71
  30    BAYES_60      
                   792
    0.62    1.45    6.68    4.28
  33    BAYES_95      
                   703
    0.55    1.29    5.93    2.10

HAM
___
   2    BAYES_50    
                   15593
    8.98   28.52   14.49   36.42
   3    BAYES_00    
                   12350
    7.11   22.59    0.44   28.85
   6    BAYES_99    
                    6323
    3.64   11.57   64.07   14.77
  19    BAYES_60      
                  1831  
  1.05    3.35    6.68    4.28
  21    BAYES_40      
                  1634  
  0.94    2.99    0.65    3.82
  22    BAYES_80      
                  1590  
  0.92    2.91    7.23    3.71
  24    BAYES_20      
                  1519  
  0.88    2.78    0.35    3.55
  29    BAYES_05      
                  1077  
  0.62    1.97    0.16    2.52
  32    BAYES_95      
                   897
    0.52    1.64    5.93    2.10


Andy


RE: Latest sa-stats from last week

2006-05-09 Thread Chris Santerre
Title: RE: Latest sa-stats from last week






> | > I'm in a similar situation as Bowie.  I had to turn of Bayes 
> | > as mail that was obviously spam was getting a Bayes_0 pulling 
> | > the # back down under the threshold.
> | > 
> | 
> | so why not just score BAYES_00, BAYES_20, etc all at at 
> 0... and keep
> | BAYES_99, BAYES_95, etc scoring what they score.  if you 
> trust its spam
> | accuracy but not its ham accuracy, that would be the 
> logical way to go i
> | would say?
> 
> 
> Hmm...good point.
> 
> I think I'll try that.  
> 
> 
> 


At least you got to smack your own head. Dallas usually just sneaks up on me and *SMACK*. And he don't have those delicate little balarena hands! He calls it his "D'man sledgehammer fist of fury!" To this day, I still can't remember anything from 1988. I'm told I'm not missing much. 

--Chris 





Re: Latest sa-stats from last week

2006-05-09 Thread qqqq
| > I'm in a similar situation as Bowie.  I had to turn of Bayes 
| > as mail that was obviously spam was getting a Bayes_0 pulling 
| > the # back down under the threshold.
| > 
| 
| so why not just score BAYES_00, BAYES_20, etc all at at 0... and keep
| BAYES_99, BAYES_95, etc scoring what they score.  if you trust its spam
| accuracy but not its ham accuracy, that would be the logical way to go i
| would say?


Hmm...good point.

I think I'll try that.  





RE: Latest sa-stats from last week

2006-05-09 Thread Dallas L. Engelken
> -Original Message-
> From:  [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, May 09, 2006 10:27
> To: Bowie Bailey; users@spamassassin.apache.org
> Subject: Re: Latest sa-stats from last week
> 
> | > Holy spoo! Bayes can do MUCH better than that!
> | > {O.O}
> |
> | I'm sure it can, but I've got per-user Bayes and most of my users 
> | don't bother to train it.
> |
> 
> I'm in a similar situation as Bowie.  I had to turn of Bayes 
> as mail that was obviously spam was getting a Bayes_0 pulling 
> the # back down under the threshold.
> 

so why not just score BAYES_00, BAYES_20, etc all at at 0... and keep
BAYES_99, BAYES_95, etc scoring what they score.  if you trust its spam
accuracy but not its ham accuracy, that would be the logical way to go i
would say?

d


Re: Latest sa-stats from last week

2006-05-09 Thread qqqq
| > Holy spoo! Bayes can do MUCH better than that!
| > {O.O}
|
| I'm sure it can, but I've got per-user Bayes and most of my users
| don't bother to train it.
|

I'm in a similar situation as Bowie.  I had to turn of Bayes as mail that was 
obviously spam was
getting a Bayes_0 pulling the # back down under the threshold.





  1   2   3   4   >