Fired rules stats understanding

2008-01-24 Thread Sébastien AVELINE




Hello,

You will find my top rules fired with spamassassin.
I have spamassassin on several boxes, each have his own bayes_db files,
I use razor, dcc_check, uribl, bayes  We have hundreds of thousand
messages per day.
In my top rules for spam you will see a lot of "collaborative rules"
like razor,uribl,dcc_check. I wonder why there isn't more heuristic and
bayesian rules in my top. Do you think that my stats seem to be
"normal" or is there something wrong ? Any suggestions are welcome.

Here my top rules:

TOP SPAM RULES FIRED
--
RANK RULE NAME COUNT %OFMAIL %OFSPAM
%OFHAM 
--
 1 RDNS_NONE 48417 99.60 99.82 99.41
 2 RAZOR2_CHECK 42113 42.50 86.82 2.88
 3 RAZOR2_CF_RANGE_51_100 41657 41.46 85.88 1.75
 4 URIBL_BLACK 41376 41.43 85.30 2.22
 5 RAZOR2_CF_RANGE_E8_51_100 39016 38.41 80.44 0.85
 6 URIBL_JP_SURBL 38221 37.21 78.80 0.05
 7 URIBL_OB_SURBL 32588 32.22 67.18 0.97
 8 URIBL_SC_SURBL 30849 30.03 63.60 0.02
 9 DCC_CHECK 27472 28.92 56.64 4.14
 10 URIBL_AB_SURBL 26134 25.43 53.88 0.00
 11 HTML_MESSAGE 25531 60.93 52.63 68.35
 12 URIBL_WS_SURBL 23317 22.94 48.07 0.48
 13 DIGEST_MULTIPLE 23267 22.74 47.97 0.20
 14 URIBL_RHS_DOB 17797 17.42 36.69 0.20
 15 RAZOR2_CF_RANGE_E4_51_100 16500 16.55 34.02 0.94
 16 BAYES_50 13772 14.69 28.39 2.44
 17 RCVD_IN_BL_SPAMCOP_NET 13594 13.48 28.03 0.48
 18 BAYES_99 11330 11.06 23.36 0.07
 19 FORGED_MUA_OUTLOOK 9043 8.86 18.64 0.11
 20 STOX_REPLY_TYPE 8199 8.21 16.90 0.43
--

TOP HAM RULES FIRED
--
RANK RULE NAME COUNT %OFMAIL %OFSPAM
%OFHAM 
--
 1 RDNS_NONE 53945 99.60 99.82 99.41
 2 BAYES_00 43583 45.21 5.94 80.31
 3 HTML_MESSAGE 37089 60.93 52.63 68.35
 4 MIME_HTML_ONLY 10131 16.80 14.71 18.67
 5 MIME_QP_LONG_LINE 4754 5.78 2.45 8.76
 6 URIBL_GREY 3498 5.88 5.26 6.45
 7 HTML_IMAGE_RATIO_02 3053 3.82 1.79 5.63
 8 SUBJ_ALL_CAPS 2796 3.26 1.14 5.15
 9 SUBJECT_NEEDS_ENCODING 2520 2.77 0.68 4.64
 10 DCC_CHECK 2248 28.92 56.64 4.14
 11 MSGID_MULTIPLE_AT 2212 2.16 0.02 4.08
 12 INVALID_DATE 2130 4.26 4.63 3.93
 13 HTML_MIME_NO_HTML_TAG 1889 2.41 1.21 3.48
 14 MPART_ALT_DIFF 1744 2.66 2.04 3.21
 15 MIME_HTML_MOSTLY 1580 1.84 0.65 2.91
 16 RAZOR2_CHECK 1564 42.50 86.82 2.88
 17 UNPARSEABLE_RELAY 1563 1.78 0.55 2.88
 18 EXTRA_MPART_TYPE 1557 2.04 1.11 2.87
 19 HTML_IMAGE_RATIO_04 1455 2.17 1.59 2.68
 20 BAYES_50 1325 14.69 28.39 2.44
--

Thanks in advance.

Seb.




Re: Feeding SA-learn

2008-01-24 Thread Anthony Peacock

John Thompson wrote:

On 2008-01-23, Anthony Peacock [EMAIL PROTECTED] wrote:

My intention was to manually feed the few spam messages that slip thru 
undetected. By the time I get a hold of those, they are in the 
recipient's mail client inbox, not in the server.
I was thinking, if I save the mail as EML files, would that preserve the 
headers in a way that sa-learn can parse correctly?



Depends on the client.

For instance, Thunderbird stores it's folders in mbox format, so 
sa-learn can work against those files as-is.  Other email clients can 
save emails in text format complete with headers.


The biggest problem with this is training the users to do that consistantly.


Isn't that what cron is for? :-)

I have a cron job on my imap server to regularly feed ham and spam 
through sa-learn.


I have a cron job that runs the learning process nightly.  I was 
refering to the process of gathering the false-negatives and 
false-positives.  That has to be done by hand, as a decision needs to be 
made about whether they are spam or not.  And, by definition, the 
automatic process has got it wrong.



--
Anthony Peacock
CHIME, Royal Free  University College Medical School
WWW:http://www.chime.ucl.ac.uk/~rmhiajp/
A CAT scan should take less time than a PET scan.  For a CAT scan,
 they're only looking for one thing, whereas a PET scan could result in
 a lot of things.- Carl Princi, 2002/07/19


Re: Fired rules stats understanding

2008-01-24 Thread Matt Kettler

Sébastien AVELINE wrote:

Hello,

You will find my top rules fired with spamassassin.
I have spamassassin on several boxes, each have his own bayes_db 
files, I use razor, dcc_check, uribl, bayes  We have hundreds of 
thousand messages per day.
In my top rules for spam you will see a lot of collaborative rules 
like razor,uribl,dcc_check. I wonder why there isn't more heuristic 
and bayesian rules in my top. Do you think that my stats seem to be 
normal or is there something wrong ? Any suggestions are welcome.


It's really absurd that RDNS_NONE is firing off on 99.6% of email.

Do you not have RDNS for your own network, or is it generating invalid 
Recieved: headers?


Ahh, yeah, it looks like your own network lacks RDNS:

Received: from unknown (HELO ?192.168.0.213?)
([EMAIL PROTECTED]@82.235.12.159) by smtpp.alinto.net with SMTP; Thu,
24 Jan 2008 09:30:20 +


If you've got a local nameserver, you might want to generate an 
in-addr.arpa zone for the 192.168.0.* network to fix that.


As for the bayes, that doesn't surprise me. There's 10 different bayes 
rules, and while I'd expect that collectively they add up to most of 
your mail, it's not surprising that they're not individually scoring 
high. It's a little surprising BAYES_50 is doing so well compared to 
BAYES_99.. with the chi-squared combining I'd expect BAYES_99 to edge it 
out slightly. Are you doing any manual training? what's your sa-learn 
--dump magic look like?




Re: Feeding SA-learn

2008-01-24 Thread Diego Pomatta

John Thompson escribió:

On 2008-01-23, Diego Pomatta [EMAIL PROTECTED] wrote:

  
I use Thunderbird. There are two files for that folder: Junk.msf (7k) 
and Junk (53.172k). The msf file must be some kind of index. I just feed 
the biggest one to sa-learn?



Yup. Use sa-learn --spam --mbox Junk to learn your spam. You'll want 
to use the --mbox switch so sa-learn will process it as an mbox format 
mailbox, since that's what Thunderbird uses to store mail.


  

~/sa-learn --spam --mbox Junk
Learned tokens from 7 message(s) (7 message(s) examined)

Looks like it worked feeding it the entire Thunderbird Junk folder file. :)
Thanks all.

Btw, what the difference between using sa-learn --spam... and 
spamassassin --report... like Anthony said?


Regards


RE: whois plugin .. where to get it

2008-01-24 Thread Giampaolo Tomassoni
 -Original Message-
 From: Matt Kettler [mailto:[EMAIL PROTECTED]
 Sent: Thursday, January 24, 2008 6:38 AM
 
 Giampaolo Tomassoni wrote:
 
  Right, it is.
 
  The URIWhois does not detect the registrar. It detects the name and
 the
  address of the DNS- and whois-defined NSes for that domain.
 
 
 So how is this substantially different from the URIDNSBL plugin that
 comes with SA?

It can also check for mismatches between the DNS- and whois-defined
nameservers, in example. The sample URIWhois.cf shows two such uses:
PARTNSMIS firing on more than 50% of mismatch among the two sets of
nameservers, and FULLNSMIS firing on more than 99,9%. As I previously said,
the NSes defined in a whois record are more difficult to change (you have
often to wait many hours before the change takes effect). Spammers basically
never change them, but they may sometimes fool your DNS resolver to look
at different NSes to resolve the domain.

An example of such dns-fooling job was the hltcjkvhyokdotcom domain, but
now you can't get an NS RR about it even from gTLD-servers.net... Basically,
spammer seems to have recently dismissed this method. This doesn't mean they
can't use it again in the future, however. Quite interestingly, they began
dismissing this method few weeks after the URIWhois plugin was out...


 Bear in mind this plugin *DOES* resolve the NSes for the domain, and
 DOES check those too. Take for example URIBL_SBL, which only makes
 sense
 in the context of the IP of the nameservers (since it's an IP based
 RBL).

Well, I use and like URIBL_SBL, but please note that a centralized solution
may easily be fooled the other way around, by giving it RRs which are not
the ones most people will see and will query for through the URIBL_SBL
itself. In order to do this spammer only need to know the address of the DNS
server(s) acting as resolvers for SpamHaus...


 I guess you could say that looking up the IP of the host in the
 URL would also work, but that's an invitation for DoS, so it's not
 something URIDNSBL does.

Sorry, didn't get this sentence. Do you mean performing a whois about the
host address? In this case, where is the DoS? Please note SpamHaus do
perform some whois queries about suspicious domains (probably not IP
address, I don't know), so URIDNSBL doesn't need to. By the way, URIDNSBL is
meant to obtain data from BLs, not from whois...


 The only big difference I see at face value is it uses whois instead of
 DNS to find the NS records.. that hardly seems efficient..

It doesn't use whois *instead of* dns. It uses both and attempts even to
detect any discrepancy between their responses.

Apart the other differences I just told you, URIWhois also checks for domain
age. I made this plugin mostly to detect this. I know that now such
information is also available through some BLs, but it is still coarser than
the URIWhois one and at the age I was developing this plugin a whois query
was the only mean available to get it.

Please note I coded the URIWhois plugin for my own use, which means a really
low whois traffic (we speak of about 500 to 1k messages/day handled by my
MXes). Since whois replies (either positive or negative ones) are cached by
this plugin, I'm not probably issuing more than 100-300 whois queries/day,
which are spread among several registrars and NICs.

This is not a traffic amount meant to cause DoS, I guess. ISPs know the
risks and probably stay at large from the URIWhois plugin...

In summary, it is true that the effectiveness of the URIWhois plugin had
been somehow severed by both spammers stopping fooling DNS RRs and BLs
implementing some of the functionalities that URIWhois had. Nevertheless, it
worked to me for some months and it had a role as a test-case for the
asynchronous engine in the development of SA 3.2.x (which doesn't mean that
SA is endorsing it at all). It could also be improved to get things like the
registrar name or detect missing replies to SOA and NS requests. So, it was
and probably still isn't completely useless.

That said, if someone wants to give it a try and can't find the download
url, I say that it is really alpha code borrowing a number of troubles and
limitations, but I also spare its download link.
 
Is it wrong?

Giampaolo



Re: Feeding SA-learn

2008-01-24 Thread Anthony Peacock

Diego Pomatta wrote:

John Thompson escribió:

On 2008-01-23, Diego Pomatta [EMAIL PROTECTED] wrote:

 
I use Thunderbird. There are two files for that folder: Junk.msf (7k) 
and Junk (53.172k). The msf file must be some kind of index. I just 
feed the biggest one to sa-learn?



Yup. Use sa-learn --spam --mbox Junk to learn your spam. You'll want 
to use the --mbox switch so sa-learn will process it as an mbox 
format mailbox, since that's what Thunderbird uses to store mail.


  

~/sa-learn --spam --mbox Junk
Learned tokens from 7 message(s) (7 message(s) examined)

Looks like it worked feeding it the entire Thunderbird Junk folder file. :)
Thanks all.

Btw, what the difference between using sa-learn --spam... and 
spamassassin --report... like Anthony said?


From:

http://spamassassin.apache.org/full/3.2.x/doc/spamassassin-run.html

-r, --report
Report this message as manually-verified spam. This will submit the 
mail message read from STDIN to various spam-blocker databases. 
Currently, these are the Distributed Checksum Clearinghouse 
http://www.rhyolite.com/anti-spam/dcc/, Pyzor 
http://pyzor.sourceforge.net/, Vipul's Razor 
http://razor.sourceforge.net/, and SpamCop http://www.spamcop.net/.


If the message contains SpamAssassin markup, the markup will be 
stripped out automatically before submission. The support modules for 
DCC, Pyzor, and Razor must be installed for spam to be reported to each 
service. SpamCop reports will have greater effect if you register and 
set the spamcop_to_address option.


The message will also be submitted to SpamAssassin's learning 
systems; currently this is the internal Bayesian statistical-filtering 
system (the BAYES rules). (Note that if you only want to perform 
statistical learning, and do not want to report mail to third-parties, 
you should use the sa-learn command directly instead.)


This option teaches the Bayesian system, but also submits to third party 
systems like DCC and SpamCop.


--
Anthony Peacock
CHIME, Royal Free  University College Medical School
WWW:http://www.chime.ucl.ac.uk/~rmhiajp/
A CAT scan should take less time than a PET scan.  For a CAT scan,
 they're only looking for one thing, whereas a PET scan could result in
 a lot of things.- Carl Princi, 2002/07/19


Re: Fired rules stats understanding

2008-01-24 Thread Sébastien AVELINE

Matt Kettler a écrit :

Sébastien AVELINE wrote:

Hello,

You will find my top rules fired with spamassassin.
I have spamassassin on several boxes, each have his own bayes_db 
files, I use razor, dcc_check, uribl, bayes  We have hundreds of 
thousand messages per day.
In my top rules for spam you will see a lot of collaborative rules 
like razor,uribl,dcc_check. I wonder why there isn't more heuristic 
and bayesian rules in my top. Do you think that my stats seem to be 
normal or is there something wrong ? Any suggestions are welcome.


It's really absurd that RDNS_NONE is firing off on 99.6% of email.

Do you not have RDNS for your own network, or is it generating invalid 
Recieved: headers?


Ahh, yeah, it looks like your own network lacks RDNS:

Received: from unknown (HELO ?192.168.0.213?)
([EMAIL PROTECTED]@82.235.12.159) by smtpp.alinto.net with SMTP; Thu,
24 Jan 2008 09:30:20 +


If you've got a local nameserver, you might want to generate an 
in-addr.arpa zone for the 192.168.0.* network to fix that.


As for the bayes, that doesn't surprise me. There's 10 different bayes 
rules, and while I'd expect that collectively they add up to most of 
your mail, it's not surprising that they're not individually scoring 
high. It's a little surprising BAYES_50 is doing so well compared to 
BAYES_99.. with the chi-squared combining I'd expect BAYES_99 to edge 
it out slightly. Are you doing any manual training? what's your 
sa-learn --dump magic look like?


Local address is from my office where I submit my mail to my 
mailservers. I think RDNS_NONE isn't the main worry. Unfortunately I 
don't use sa-learn to feed my bayes, I rely on high number of mails that 
come into my servers.

Is it really efficient to train the bayes manualy ?
Here you can see the result from sa-learn --dump magic:

0.000  0  3  0  non-token data: bayes db version
0.000  03803618  0  non-token data: nspam
0.000  0 862246  0  non-token data: nham
0.000  0 496111  0  non-token data: ntokens
0.000  0 1181735997  0  non-token data: oldest atime
0.000  0 1198170104  0  non-token data: newest atime
0.000  0 1181805393  0  non-token data: last journal 
sync atime

0.000  0 1181779437  0  non-token data: last expiry atime
0.000  0  43200  0  non-token data: last expire 
atime delta
0.000  0 476160  0  non-token data: last expire 
reduction count





Re: Expiry problem

2008-01-24 Thread Michael Parker


On Jan 23, 2008, at 9:54 PM, Steven Stern wrote:



It's finally started to remove tokens, so I think I'm OK. We use SQL
bayes, so it was an easy matter to use

~  delete from bayes_token where atime  UNIX_TIMESTAMP();

to clean up the stuff from the future.




But now your bayes_vars table is broken/off.  You might want to update  
those counts as well.


Michael


Upgrade 3.2.3-3.2.4 breaks rule override

2008-01-24 Thread Karl Boyken
We're running SpamAssassin from MIMEDefang 2.63 on RedHat Linux 
Enterprise Server 5.  We recently upgraded SpamAssassin from 3.2.3 to 
3.2.4.  We'd configured sa-mimedefang.cf to use a local Spamhaus mirror 
for __RCVD_IN_ZEN, RCVD_IN_XBL and RCVD_IN_PBL.  I just copied over the 
header lines from 20_dnsbl_tests.cf and replaced zen.spamhaus.org. 
with the domain that the local mirror uses.  The hack was working.  But 
when we upgraded to 3.2.4, it broke.  Any suggestions as to how to 
configure SpamAssassin to use the local Spamhaus mirror?  Thanks.


Karl Boyken

--
Karl Boyken, system administrator 
[EMAIL PROTECTED]
303A MLH, Dept. of Comp. Sci. 
http://www.cs.uiowa.edu/~boyken/
The U. of Iowa, Iowa City, IA  52242   319-335-2730 (voice) 
319-335-3668 (fax)


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Upgrade 3.2.3-3.2.4 breaks rule override

2008-01-24 Thread Matus UHLAR - fantomas
On 24.01.08 09:22, Karl Boyken wrote:
 We're running SpamAssassin from MIMEDefang 2.63 on RedHat Linux 
 Enterprise Server 5.  We recently upgraded SpamAssassin from 3.2.3 to 
 3.2.4.  We'd configured sa-mimedefang.cf to use a local Spamhaus mirror 
 for __RCVD_IN_ZEN, RCVD_IN_XBL and RCVD_IN_PBL.  I just copied over the 
 header lines from 20_dnsbl_tests.cf and replaced zen.spamhaus.org. 
 with the domain that the local mirror uses.  The hack was working.  But 
 when we upgraded to 3.2.4, it broke.  Any suggestions as to how to 
 configure SpamAssassin to use the local Spamhaus mirror?  Thanks.

don't play with spamassassin, configure your DNS server to use the mirror
first.
-- 
Matus UHLAR - fantomas, [EMAIL PROTECTED] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I don't have lysdexia. The Dog wouldn't allow that.


p0f-analyzer.pm

2008-01-24 Thread Yet Another Ninja

Guys

- Centos 5.0
- Perl 5.8.8
- SA 3.2.4 + sa-updates

p0f-analyzer.pm from
http://bl0g.blogdns.com/spamassassin/p0f-analyzer.pm

p0f-analyzer.pm is spitting:

Jan 24 16:49:18 inet3 spamd[11516]: Use of uninitialized value in 
concatenation (.) or string at /etc/mail/spamassassin/p0f-analyzer.pm 
line 196, GEN50 line 66.
Jan 24 16:49:18 inet3 spamd[11516]: Use of uninitialized value in 
concatenation (.) or string at 
/usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/Message.pm line 492, 
GEN50 line 66.
Jan 24 16:49:26 inet3 spamd[11516]: Use of uninitialized value in 
concatenation (.) or string at 
/usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/Message.pm line 492.



seems the p0f plugin has stopped working - with SA 3.2.3 it was fine.


anybody else seeing this?

thx
AXB




Re: Upgrade 3.2.3-3.2.4 breaks rule override

2008-01-24 Thread Kris Deugau

Karl Boyken wrote:
We're running SpamAssassin from MIMEDefang 2.63 on RedHat Linux 
Enterprise Server 5.  We recently upgraded SpamAssassin from 3.2.3 to 
3.2.4.  We'd configured sa-mimedefang.cf to use a local Spamhaus mirror 
for __RCVD_IN_ZEN, RCVD_IN_XBL and RCVD_IN_PBL.  I just copied over the 
header lines from 20_dnsbl_tests.cf and replaced zen.spamhaus.org. 
with the domain that the local mirror uses.  The hack was working.  But 
when we upgraded to 3.2.4, it broke.  Any suggestions as to how to 
configure SpamAssassin to use the local Spamhaus mirror?  Thanks.


Need more detail to be certain about how best to fix the problem;  you 
don't mention how you installed SA or what you mean by 'I just copied 
over the header lines...'


A couple of suggestions:

1) Mirror the Spamhaus data properly by configuring your DNS machines 
to refer to the local data when anything from that zone is requested. 
For example, with a BIND resolver, you'd (IIRC) add a forwarder entry 
for zen.spamhaus.org pointing to the local authoritative server.  In the 
long run this is probably a better solution than fiddling with upstream 
SA rule definitions.


2) Move your hack to the right place in the SA configuration.  It sounds 
like you edited the files in /usr/share/spamassassin - those files are 
provided by SA itself, and are supposedly clearly documented as Don't 
touch - these WILL be overwritten by upgrades!!  Simply place your 
changed configuration in local.cf (or some other .cf file in the same 
location), and SA will use that instead of its defaults.


Wading through spamassassin -D --lint should give you more info on where 
the underlying problem is coming from.


-kgd


Re: whois plugin .. where to get it

2008-01-24 Thread John D. Hardin
On Thu, 24 Jan 2008, Jeff Chan wrote:

 Quoting Matt Kettler [EMAIL PROTECTED]:
 
  The only big difference I see at face value is it uses whois instead of
  DNS to find the NS records.. that hardly seems efficient..
 
 Whois is definitely the wrong protocol to use for automated
 testing, especially for any high volumes.  It was not designed or
 intended for that purpose, which is arguably abusive.

There seems to be a desire for someone to set up a URIBL for domains 
registered with spam-friendly registrars. That's the logical way to go 
about it. Then the domains could be bulk-updated from the registrar 
feeds rather than abusing whois.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  USMC Rules of Gunfighting #12: Have a plan.
  USMC Rules of Gunfighting #13: Have a back-up plan, because the
  first one won't work.
---
 3 days until the 41st anniversary of the loss of Apollo 1



Re: Expiry problem

2008-01-24 Thread Steven Stern

Michael Parker wrote:


On Jan 23, 2008, at 9:54 PM, Steven Stern wrote:



It's finally started to remove tokens, so I think I'm OK. We use SQL
bayes, so it was an easy matter to use

~  delete from bayes_token where atime  UNIX_TIMESTAMP();

to clean up the stuff from the future.




But now your bayes_vars table is broken/off.  You might want to update 
those counts as well.




I did that, too.


Enable Spamcop only

2008-01-24 Thread Mofo_Jones

I am trying to setup my first SA and I can't seem to get the SA to do a check
on Spamcop. The following are my cf files and debug information. Can someone
please tell me what I am doing wrong? Sorry for all the information given
below. Not sure what to do.
--Local.cf
# SpamAssassin config file for version 3.x
# NOTE: NOT COMPATIBLE WITH VERSIONS 2.5 or 2.6
# See http://www.yrex.com/spam/spamconfig25.php for earlier versions
# Generated by http://www.yrex.com/spam/spamconfig.php (version 1.50)

# How many hits before a message is considered spam.
required_score   5.0

# Encapsulate spam in an attachment (0=no, 1=yes, 2=safe)
report_safe 0

# Enable the Bayes system
use_bayes   0

# Enable Bayes auto-learning
bayes_auto_learn  0

# Enable or disable network checks
use_razor2  0
use_pyzor   0

# Mail using languages used in these country codes will not be marked
# as being possibly spam in a foreign language.

# Mail using locales used in these country codes will not be marked
# as being possibly spam in a foreign language.
ok_locales  all

skip_rbl_checks 0
rbl_timeout 15 # Timeout for lookups in seconds
add_header all RBL-Results _RBL_
#
---
# NOTE: donation tests, see README file for details

header RCVD_IN_BL_SPAMCOP_NET   eval:check_rbl_txt('spamcop',
'bl.spamcop.net.', '(?i:spamcop)')
describe RCVD_IN_BL_SPAMCOP_NET Received via a relay in bl.spamcop.net
tflags RCVD_IN_BL_SPAMCOP_NET   net
#reuse RCVD_IN_BL_SPAMCOP_NET
--init.pre
# This is the right place to customize your installation of SpamAssassin.
#
# See 'perldoc Mail::SpamAssassin::Conf' for details of what can be
# tweaked.
#
# This file contains plugin activation commands for plugins included
# in SpamAssassin 3.0.x releases.  It will not be installed if you
# already have a file in place called init.pre.
#
# There are now multiple files read to enable plugins in the 
# /etc/mail/spamassassin directory; previously only one, init.pre was 
# read.  Now both init.pre, v310.pre, and any other files ending in
# .pre will be read.  As future releases are made, new plugins will be
# added to new files, named according to the release they're added in.
###

# RelayCountry - add metadata for Bayes learning, marking the countries
# a message was relayed through
#
# Note: This requires the IP::Country::Fast Perl module
#
# loadplugin Mail::SpamAssassin::Plugin::RelayCountry

# URIDNSBL - look up URLs found in the message against several DNS
# blocklists.
#
loadplugin Mail::SpamAssassin::Plugin::URIDNSBL

# Hashcash - perform hashcash verification.
#
loadplugin Mail::SpamAssassin::Plugin::Hashcash

# SPF - perform SPF verification.
#
loadplugin Mail::SpamAssassin::Plugin::SPF
--SA Debug
spamassassin -D --lint
[11631] dbg: logger: adding facilities: all
[11631] dbg: logger: logging level is DBG
[11631] dbg: generic: SpamAssassin version 3.2.3
[11631] dbg: config: score set 0 chosen.
[11631] dbg: util: running in taint mode? yes
[11631] dbg: util: taint mode: deleting unsafe environment variables,
resetting PATH
[11631] dbg: util: PATH included '/usr/local/sbin', keeping
[11631] dbg: util: PATH included '/usr/local/bin', keeping
[11631] dbg: util: PATH included '/usr/sbin', keeping
[11631] dbg: util: PATH included '/usr/bin', keeping
[11631] dbg: util: PATH included '/sbin', keeping
[11631] dbg: util: PATH included '/bin', keeping
[11631] dbg: util: PATH included '/usr/games', keeping
[11631] dbg: util: final PATH set to:
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
[11631] dbg: dns: no ipv6
[11631] dbg: dns: is Net::DNS::Resolver available? yes
[11631] dbg: dns: Net::DNS version: 0.60
[11631] dbg: diag: perl platform: 5.008008 linux
[11631] dbg: diag: module installed: Digest::SHA1, version 2.11
[11631] dbg: diag: module installed: HTML::Parser, version 3.56
[11631] dbg: diag: module installed: Net::DNS, version 0.60
[11631] dbg: diag: module installed: MIME::Base64, version 3.07
[11631] dbg: diag: module installed: DB_File, version 1.814
[11631] dbg: diag: module installed: Net::SMTP, version 2.29
[11631] dbg: diag: module installed: Mail::SPF, version v2.005
[11631] dbg: diag: module installed: Mail::SPF::Query, version 1.999001
[11631] dbg: diag: module not installed: IP::Country::Fast ('require'
failed)
[11631] dbg: diag: module not installed: Razor2::Client::Agent ('require'
failed)
[11631] dbg: diag: module not installed: Net::Ident ('require' failed)
[11631] dbg: diag: module not installed: IO::Socket::INET6 ('require'
failed)
[11631] dbg: diag: module not installed: IO::Socket::SSL ('require' failed)

Re: Enable Spamcop only

2008-01-24 Thread John D. Hardin
On Thu, 24 Jan 2008, Mofo_Jones wrote:

 I am trying to setup my first SA and I can't seem to get the SA to do a check
 on Spamcop. The following are my cf files and debug information. Can someone
 please tell me what I am doing wrong?

 [11631] dbg: plugin: loading Mail::SpamAssassin::Plugin::SpamCop from @INC
 [11631] dbg: reporter: local tests only, disabling SpamCop

You can't use --lint to test spamcop.

Compose or obtain a test message and feed it to spamassassin with 
debug flags turned on, then you'll be able to see (and tell us) what's 
happening.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  If Microsoft made hammers, everyone would whine about how poorly
  screws were designed and about how they are hard to hammer in, and
  wonder why it takes so long to paint a wall using the hammer.
---
 3 days until the 41st anniversary of the loss of Apollo 1



Re: Enable Spamcop only

2008-01-24 Thread Mofo_Jones

John, I have been looking everywhere for how to send a test message that will
show up as an RBL. Do you know hot to test this? I have Googled my self to
almost death. 

John D. Hardin wrote:
 
 On Thu, 24 Jan 2008, Mofo_Jones wrote:
 
 I am trying to setup my first SA and I can't seem to get the SA to do a
 check
 on Spamcop. The following are my cf files and debug information. Can
 someone
 please tell me what I am doing wrong?
 
 [11631] dbg: plugin: loading Mail::SpamAssassin::Plugin::SpamCop from
 @INC
 [11631] dbg: reporter: local tests only, disabling SpamCop
 
 You can't use --lint to test spamcop.
 
 Compose or obtain a test message and feed it to spamassassin with 
 debug flags turned on, then you'll be able to see (and tell us) what's 
 happening.
 
 --
  John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
  [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
 ---
   If Microsoft made hammers, everyone would whine about how poorly
   screws were designed and about how they are hard to hammer in, and
   wonder why it takes so long to paint a wall using the hammer.
 ---
  3 days until the 41st anniversary of the loss of Apollo 1
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Enable-Spamcop-only-tp15072295p15075042.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Enable Spamcop only

2008-01-24 Thread Jari Fredriksson
 John, I have been looking everywhere for how to send a
 test message that will show up as an RBL. Do you know hot
 to test this? I have Googled my self to almost death.
 
 John D. Hardin wrote:
 
 On Thu, 24 Jan 2008, Mofo_Jones wrote:
 
 I am trying to setup my first SA and I can't seem to
 get the SA to do a check
 on Spamcop. The following are my cf files and debug
 information. Can someone
 please tell me what I am doing wrong?
 
 [11631] dbg: plugin: loading
 Mail::SpamAssassin::Plugin::SpamCop from @INC
 [11631] dbg: reporter: local tests only, disabling
 SpamCop 
 
 You can't use --lint to test spamcop.
 
 Compose or obtain a test message and feed it to
 spamassassin with debug flags turned on, then you'll be
 able to see (and tell us) what's happening.
 


spamassassin -D message.txt

Where message.txt is containing the message to test.




Re: Enable Spamcop only

2008-01-24 Thread Mofo_Jones

Sorry, What I meant was how do I send a email to the SA server that will be
tagged so I can see it in the message. 

Jari Fredriksson wrote:
 
 John, I have been looking everywhere for how to send a
 test message that will show up as an RBL. Do you know hot
 to test this? I have Googled my self to almost death.
 
 John D. Hardin wrote:
 
 On Thu, 24 Jan 2008, Mofo_Jones wrote:
 
 I am trying to setup my first SA and I can't seem to
 get the SA to do a check
 on Spamcop. The following are my cf files and debug
 information. Can someone
 please tell me what I am doing wrong?
 
 [11631] dbg: plugin: loading
 Mail::SpamAssassin::Plugin::SpamCop from @INC
 [11631] dbg: reporter: local tests only, disabling
 SpamCop 
 
 You can't use --lint to test spamcop.
 
 Compose or obtain a test message and feed it to
 spamassassin with debug flags turned on, then you'll be
 able to see (and tell us) what's happening.
 
 
 
 spamassassin -D message.txt
 
 Where message.txt is containing the message to test.
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Enable-Spamcop-only-tp15072295p15075630.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Particular subject blacklist seems not to work

2008-01-24 Thread spamassassin


I am running SpamAssassin version 3.1.7 with Postfix via amavisd on a 
FreeBSD machine.


In the last few weeks, all of a sudden messages with the same 4 or 5 
subject lines started coming through undetected for some reason.


So I decided to add patterns matching those to 
/usr/local/share/spamassassin/60_whitelist_subject.cf


They are in the form of:

blacklist_subject   *string*


All of them seemed to work, except for one. I continue to get messages 
with the following Subject header:


:: 86% Cheaper than Original Price: aRolex, Cartier, Omega, Chanel, Tag Heuer,


I had tried adding the following entries:

blacklist_subject   *Cheaper than Original Price*
blacklist_subject   *aRolex*


...but to no avail.


Is there some pattern in that subject line that allows it to come through 
unscathed?


Thanks,

-FONG



 -
 shot through the heart  ooh baby do you know what that's worth
 and you're to blame ooh heaven is a place on earth
 darling you give love  they say in heaven love comes first
 a bad name  we'll make heaven a place on earth
 ORBITAL Halcyon Live


Re: Enable Spamcop only

2008-01-24 Thread John D. Hardin
On Thu, 24 Jan 2008, Mofo_Jones wrote:

  spamassassin -D message.txt
  
  Where message.txt is containing the message to test.

 Sorry, What I meant was how do I send a email to the SA server
 that will be tagged so I can see it in the message.

If your SA is configured to add status headers, the command above will
do that. 

What you need to do is compose a message.txt file that looks like an
RFC-822-format email message, or obtain one from a message store
somewhere. Your mailboxes on the server are one possible source, you'd
just copy one message out of them using (for instance) a text editor.
Getting an RFC-822-format message file out of a mail client is
generally a hassle compared to just editing a Unix mailbox file.

You should be able to see in the debug output whether or not SpamCop
is being run. Note that you may not get any points from SpamCop, as
you'd need to use a test message that SpamCop considers spammy, and I
don't know how to do that.

Also: please break the habit of top-posting, and learn the habit of 
pruning your reply. Thanks.
 
--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Perfect Security is unattainable; beware those who would try to sell
  it to you, regardless of the cost, for they are trying to sell you
  your own slavery.
---
 3 days until Wolfgang Amadeus Mozart's 252nd Birthday



Re: Particular subject blacklist seems not to work

2008-01-24 Thread John D. Hardin
On Thu, 24 Jan 2008 [EMAIL PROTECTED] wrote:

 In the last few weeks, all of a sudden messages with the same 4 or 5 
 subject lines started coming through undetected for some reason.
 
 So I decided to add patterns matching those to 
 /usr/local/share/spamassassin/60_whitelist_subject.cf

Silly question: are you sure that your WhiteListSubject plugin is even 
working in the first place?

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  If Microsoft made hammers, everyone would whine about how poorly
  screws were designed and about how they are hard to hammer in, and
  wonder why it takes so long to paint a wall using the hammer.
---
 3 days until the 41st anniversary of the loss of Apollo 1



Re: Particular subject blacklist seems not to work

2008-01-24 Thread spamassassin


I am fairly sure. The other subject lines started getting flagged when I 
added entries for them. And I sent emails from an outside account with a 
subject that matched one of the other patterns and it got flagged.


Is there a more concrete way to determine whether 60_whitelist_subject.cf 
is actually working?


On Thu, 24 Jan 2008, John D. Hardin wrote:


On Thu, 24 Jan 2008 [EMAIL PROTECTED] wrote:


In the last few weeks, all of a sudden messages with the same 4 or 5
subject lines started coming through undetected for some reason.

So I decided to add patterns matching those to
/usr/local/share/spamassassin/60_whitelist_subject.cf


Silly question: are you sure that your WhiteListSubject plugin is even
working in the first place?

--
John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
[EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 If Microsoft made hammers, everyone would whine about how poorly
 screws were designed and about how they are hard to hammer in, and
 wonder why it takes so long to paint a wall using the hammer.
---
3 days until the 41st anniversary of the loss of Apollo 1




Re: Particular subject blacklist seems not to work

2008-01-24 Thread John D. Hardin
On Thu, 24 Jan 2008 [EMAIL PROTECTED] wrote:

 I am fairly sure. The other subject lines started getting flagged
 when I added entries for them. And I sent emails from an outside
 account with a subject that matched one of the other patterns and
 it got flagged.
 
 Is there a more concrete way to determine whether
 60_whitelist_subject.cf is actually working?

No, that's all I had in mind. If you have other rules where it's 
hitting then the plugin is working.

Are any of those other rules blacklist rules?

Can you create a blacklist test rule and prove that it works?

I'm not familiar with the WhiteListSubject plugin, but wonder about
the match syntax. Everything else uses REs, so try the rules (1)
without the * wildcards at all, and (2) with .* instead of *  
just in case it's not *really* using fileglob syntax... :)

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Liberals love sex ed because it teaches kids to be safe around their
  sex organs. Conservatives love gun education because it teaches kids
  to be safe around guns. However, both believe that the other's
  education goals lead to dangers too terrible to contemplate.
---
 3 days until Wolfgang Amadeus Mozart's 252nd Birthday



Re: whois plugin .. where to get it

2008-01-24 Thread Matt Kettler

John D. Hardin wrote:

On Thu, 24 Jan 2008, Jeff Chan wrote:

  

Quoting Matt Kettler [EMAIL PROTECTED]:



The only big difference I see at face value is it uses whois instead of
DNS to find the NS records.. that hardly seems efficient..
  

Whois is definitely the wrong protocol to use for automated
testing, especially for any high volumes.  It was not designed or
intended for that purpose, which is arguably abusive.



There seems to be a desire for someone to set up a URIBL for domains 
registered with spam-friendly registrars. That's the logical way to go 
about it. Then the domains could be bulk-updated from the registrar 
feeds rather than abusing whois.


True, but the whois plugin doesn't check registrars. It checks 
nameservers advertised in whois. Period.


To quote it's author:

The URIWhois does not detect the registrar. It detects the name and the
address of the DNS- and whois-defined NSes for that domain.



Re: whois plugin .. where to get it

2008-01-24 Thread Matt Kettler

Giampaolo Tomassoni wrote:

-Original Message-
From: Matt Kettler [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 24, 2008 6:38 AM

Giampaolo Tomassoni wrote:


Right, it is.

The URIWhois does not detect the registrar. It detects the name and
  

the


address of the DNS- and whois-defined NSes for that domain.

  

So how is this substantially different from the URIDNSBL plugin that
comes with SA?



It can also check for mismatches between the DNS- and whois-defined
nameservers, in example. The sample URIWhois.cf shows two such uses:
PARTNSMIS firing on more than 50% of mismatch among the two sets of
nameservers, and FULLNSMIS firing on more than 99,9%. As I previously said,
the NSes defined in a whois record are more difficult to change (you have
often to wait many hours before the change takes effect). Spammers basically
never change them, but they may sometimes fool your DNS resolver to look
at different NSes to resolve the domain.

An example of such dns-fooling job was the hltcjkvhyokdotcom domain, but
now you can't get an NS RR about it even from gTLD-servers.net... Basically,
spammer seems to have recently dismissed this method. This doesn't mean they
can't use it again in the future, however. Quite interestingly, they began
dismissing this method few weeks after the URIWhois plugin was out...


  

Bear in mind this plugin *DOES* resolve the NSes for the domain, and
DOES check those too. Take for example URIBL_SBL, which only makes
sense
in the context of the IP of the nameservers (since it's an IP based
RBL).



Well, I use and like URIBL_SBL, but please note that a centralized solution
may easily be fooled the other way around, by giving it RRs which are not
the ones most people will see and will query for through the URIBL_SBL
itself. In order to do this spammer only need to know the address of the DNS
server(s) acting as resolvers for SpamHaus...


  

I guess you could say that looking up the IP of the host in the
URL would also work, but that's an invitation for DoS, so it's not
something URIDNSBL does.



Sorry, didn't get this sentence. Do you mean performing a whois about the
host address? In this case, where is the DoS? Please note SpamHaus do
perform some whois queries about suspicious domains (probably not IP
address, I don't know), so URIDNSBL doesn't need to. By the way, URIDNSBL is
meant to obtain data from BLs, not from whois...


  

The only big difference I see at face value is it uses whois instead of
DNS to find the NS records.. that hardly seems efficient..



It doesn't use whois *instead of* dns. It uses both and attempts even to
detect any discrepancy between their responses.
  
How are these going to be different?? The information published to whois 
has to match the information published to the authoritative DNS servers 
for the TLD the domain falls under.


I guess you could send a request to one of the servers for the domain 
and ask for a NS record. But that's asking for a DoS. You could also 
still do it a lot more efficiently by sending one to the authority for 
the TLD, and one to the domain server.







Re: whois plugin .. where to get it

2008-01-24 Thread Matt Kettler

Matt Kettler wrote:

Giampaolo Tomassoni wrote:


It doesn't use whois *instead of* dns. It uses both and attempts even to
detect any discrepancy between their responses.
  
How are these going to be different?? The information published to 
whois has to match the information published to the authoritative DNS 
servers for the TLD the domain falls under.


I guess you could send a request to one of the servers for the domain 
and ask for a NS record. But that's asking for a DoS. You could also 
still do it a lot more efficiently by sending one to the authority for 
the TLD, and one to the domain server.


Ahh, I see what you're doing, you're looking up the SOA. Which is 
basically forcing the query down to the spammer's DNS server, and 
opening yourself up for a DoS attack.


hint: a malicious spammer could fill an email  with domains that point 
to a server which generates really slow responses to your SOA querries, 
bogging your server down with DNS timeouts.  This is the whole reason 
why nothing in SA ever does an A record lookup on URI's. Doing a SOA 
lookup isn't quite as bad, as it would take many domains instead of many 
hosts, but it's still the same concept.















Re: Feeding SA-learn

2008-01-24 Thread John Thompson
On 2008-01-24, Anthony Peacock [EMAIL PROTECTED] wrote:

 John Thompson wrote:
 
 Isn't that what cron is for? :-)
 
 I have a cron job on my imap server to regularly feed ham and spam 
 through sa-learn.

 I have a cron job that runs the learning process nightly.  I was 
 refering to the process of gathering the false-negatives and 
 false-positives.  That has to be done by hand, as a decision needs to be 
 made about whether they are spam or not.  And, by definition, the 
 automatic process has got it wrong.

Right. So I maulally sort the false negatives/positives into their 
proper places (I don't usually get more than a couple a day) and let the 
cron job learn them later.

-- 

John ([EMAIL PROTECTED])



Re: Feeding SA-learn

2008-01-24 Thread John Thompson
On 2008-01-24, Mark Johnson [EMAIL PROTECTED] wrote:

 John Thompson wrote:

 Isn't that what cron is for? :-)
 
 I have a cron job on my imap server to regularly feed ham and spam 
 through sa-learn.

 Do you delete the messages from the IMAP folder after you learn them? 
 If so, how do you go about that?  I'm pretty sure if I deleted the mail 
 files from the command line, I have to run a reconstruct on the mailbox 
 or the folder throws errors on the client.  This is on a Cyrus IMAP server.

No. I use Thunderbird and just set the Junk filter controls to expire 
junk messages after a couple weeks.

-- 

John ([EMAIL PROTECTED])



Re: Feeding SA-learn

2008-01-24 Thread Mark Johnson

John Thompson wrote:


No. I use Thunderbird and just set the Junk filter controls to expire 
junk messages after a couple weeks.




Interesting idea!  Thanks for the tips!  You have no idea how much time 
and how many steps this is going to save me.


--
Mark Johnson
http://www.astroshapes.com/information-technology/blog



Re: Particular subject blacklist seems not to work

2008-01-24 Thread Matt Kettler

[EMAIL PROTECTED] wrote:


I am running SpamAssassin version 3.1.7 with Postfix via amavisd on a 
FreeBSD machine.


In the last few weeks, all of a sudden messages with the same 4 or 5 
subject lines started coming through undetected for some reason.


So I decided to add patterns matching those to 
/usr/local/share/spamassassin/60_whitelist_subject.cf


They are in the form of:

blacklist_subject*string*


All of them seemed to work, except for one. I continue to get messages 
with the following Subject header:


:: 86% Cheaper than Original Price: aRolex, Cartier, Omega, Chanel, 
Tag Heuer,



I had tried adding the following entries:

blacklist_subject   *Cheaper than Original Price*
blacklist_subject   *aRolex*


...but to no avail.


Is there some pattern in that subject line that allows it to come 
through unscathed? 


You might want to look at the message source. I'm not intimately 
familiar with the whitelist subject plugin, but it's possible there's 
some kind of HTML or encoding in the original subject line that your 
email client is translating, but the whitelist_subject plugin isn't...


You might also check for two subject headers in the message.. SA might 
be using one, but your client may be using the other..