RE: sa-learn help

2005-03-18 Thread Gary W. Smith
The problem is that when they forward the email you will loose the
headers and it will think they are the spam/hammers.

-Original Message-
From: Matt [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 17, 2005 3:05 PM
To: [EMAIL PROTECTED]
Subject: sa-learn help

I am running a Directadmin server that uses Exim and Spamassassin 3.0.2 
release.  I would like to create two email addresses such as 
[EMAIL PROTECTED] and [EMAIL PROTECTED]  Then I would
like 
to ask all our users to forward there ham or spam to these addresses as
an 
attachment.  Then magically have some cronjob that runs sa-learn on them

every 5 minutes or so.

Has anyone done something like this?  If so how?  Most of our users use 
Outlook Express for email.  Nearly 1000 email accounts.

Also, Spamassassin seems to create a seperate bayes file for each user.
For 
this I would like to have these addresses cover all domains and users on
the 
server.  Is that possible?

Thanks

Matt 




bayes test

2005-03-18 Thread aktor
Hi,

I wonder how I have to train spamassassin to get bayes_XX test start
working.

I have a rule that trains the bayessian filter with each email y
received with the sa-learn tool. After some months of training (I
thought I needed 200 of spam and 200 of ham) I haven't seen it yet.

The last spam my spamassassin caught it had these tests:


Return-Path: [EMAIL PROTECTED]
X-Original-To: aktor{@|aktornet.ath.cx
Delivered-To: aktor{@|aktornet.ath.cx
Received: from 203.90.52.8 (unknown [203.90.52.8])
by aktornet.ath.cx (Postfix) with SMTP id 375F6BB49
for aktor{@|aktornet.ath.cx; Thu, 17 Mar 2005 06:19:52 +0100 (CET)
From: ydlBobby [EMAIL PROTECTED]
To: aktor{@|aktornet.ath.cx
Subject: Better than Vìagra and cheaper, too! npdu
Sender: ydlBobby [EMAIL PROTECTED]
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Date: Wed, 16 Mar 2005 22:25:20 -0600
X-Mailer: Microsoft Outlook Express 5.00.2615.200
Message-Id: [EMAIL PROTECTED]
X-Virus-Scanned: por AMAVIS + CLAMAV en aktornet.ath.cx
X-Amavis-Alert: BAD HEADER Non-encoded 8-bit data (char EC hex) in
message header 'Subject'Subject: Better than V\354agra and cheape... ^
X-Spam-Status: Yes, hits=11.1 tagged_above=0.0 required=4.0
tests=DRUGS_ERECTILE, DRUGS_ERECTILE_OBFU, FORGED_HOTMAIL_RCVD2,
FORGED_MUA_OUTLOOK, INFO_TLD, MSGID_FROM_MTA_ID, RCVD_NUMERIC_HELO
X-Spam-Level: ***
X-Spam-Flag: YES


No BAYES_XX test.

I use spamassassin through amavisd-new, with Mail::SpamAssassin Perl
module, with default options.

[EMAIL PROTECTED] aktor $ sa-learn --dump magic
0.000   0  3 0  non-token data: bayes db version
0.000   0568 0  non-token data: nspam
0.000   0   1996 0  non-token data: nham
0.000   0 203190 0  non-token data: ntokens
0.000   0 1086896787 0  non-token data: oldest atime
0.000   0 102059 0  non-token data: newest atime
0.000   0  0 0  non-token data: last journal sync atime
0.000   0 102285 0  non-token data: last expiry atime 
0.000   0   29436939 0  non-token data: last expire atime delta
0.000   0  0 0  non-token data: last expire reduction count

Do I have to do something else? What am I doing wrong?

Thank you,

aktor
-- 
Bienaventurados los pesimistas, porque ellos harán backups.
-- Www.frases.com. 

This mail is copyleft-ed to aktor under the terms of the CC License
(Creative Commons). 


pgpKuz0I7n2f7.pgp
Description: PGP signature


Re: URI Tests and Japanese Chars (solved)

2005-03-18 Thread List Mail User
...
To: Daryl C. W. O'Shea [EMAIL PROTECTED]
Cc: List Mail User [EMAIL PROTECTED], [EMAIL PROTECTED],
users@spamassassin.apache.org
Subject: Re: URI Tests and Japanese Chars (solved) 
In-Reply-To: [EMAIL PROTECTED] 
From: [EMAIL PROTECTED] (Justin Mason)

Justin,

Daryl C. W. O'Shea writes:
 List Mail User wrote:
 Jeff,
  
 RFC 1630 make pretty clear that a email address in either a mailto:;
  or cid:; clause *is* a URI.  It does not address whether a bare email 
  address
  would count (it seems that it doesn't fit the RFC definition, but does fit
  some other I found by Goggle).
  
 I could be convinced either way from a bare address (as it stand now,
  maybe someone else has something to add).  But a mailto:; mail: or 
  cid:;
  clause should (in my opinion) be looked up by the URI rules - they are URI,
  not URL rules (though URLs are clearly the most common from of URIs).
  
 I was surprised to see that from the RFC, even Msg-Id: clauses
  are URIs.
  
 Paul Shupak
 [EMAIL PROTECTED]
 
 I'd agree with Paul, what's the difference between doing the lookup of 
 the domain listed in a mailto: link and a http: link -- both of which 
 are often found in someone's signature?
 
 Eliminating the mailto: domain lookup could lead to spam such as email 
 us at [EMAIL PROTECTED] for all the junk you don't really want.

However, it's an impedance mismatch between what's going into the backends
(the SBL and SURBL uribls) and what we're matching on the other end.

At least for SBL, it's definitely problematic, since a SBL escalation
(of mail relays) will blocklist mail that *mentions* that domain!

Thats not true in general.  Since the SBL is an IP based list,
a mail server escalation would have no effect on any other domain, only
on messages relayed through the servers.

The more common case where a SBL escalation will affect other domains
is (the typical kind I've noticed) when they list all corporate servers and
some otherwise innocent domains use name servers within that space (this was
the Russian government/Rostelecom earlier this week).

Still, you are correct, there is a big difference between the SURBL
policy of zero FPs and the SBL policy, which I can best state as kill the
spammers.  SURBLs rarely have `collateral' damage and their default scores
reflect that;  The URIBL_SBL is only assigned scores of 0 0.629 0 0.996
in 3.0.2 - Only URIBL_AB_SURBL with set 3 and URIBL_WS_SURBL with set 1 are
ever assigned lower scores than the URIBL_SBL.  All the other SURBL have
significantly higher scores - URIBL_SC_SURBL is many times what URIBL_SBL is.
(You may not know, but I even proposed adding back the SPEWS lists, though
with low scores, and I do use all the rfci lists with relatively low scores
except for bogusmx, which may be the best single indicator I have ever found,
and I still assign it fewer points than URIBL_SC_SURBL).

- --j.
{snipped PGP SIGNATURE]

Paul Shupak
[EMAIL PROTECTED]

P.S. I understand the political problems with the particular FPs that SPEWS
generates, but I do hope the rfci lists make it to the URIBL rulesets.


RE: URI Tests and Japanese Chars (solved)

2005-03-18 Thread List Mail User
...
Subject: RE: URI Tests and Japanese Chars (solved)
Date: Thu, 17 Mar 2005 17:41:03 -0500
...
From: Rose, Bobby [EMAIL PROTECTED]
To: [EMAIL PROTECTED], Daryl C. W. O'Shea [EMAIL PROTECTED]
Cc: List Mail User [EMAIL PROTECTED], [EMAIL PROTECTED],
users@spamassassin.apache.org
...

But in my test messages the email address wasn't in the form of a URI.
It was just the email address.  I even used pine for a test to make sure
it was a gui client doing some reformatting business.

Do we know if it's possible to know if the results from SBL are for the
domain of the URI being queried or if their results are due to some
association with the domain being queried.  If so then we could ignore
any results other than for the domain being queried or weigh the results
differently so long as they aren't accumulative points for each
occurrence.  Otherwise, the points would add up the more that person's
email address appears in the email.

It has been suggested before that the indirect name server lookup
done be a different class of rules and/or scored differently than the direct
lookups - by default the SBL is the only list used for name servers, but on
my servers I use several other lists (and then there is Bugzilla #4106

-Original Message-
all snipped]

Paul Shupak
[EMAIL PROTECTED]

P.S. Extra points for anyone who actually knows why Bugzilla (or Mozilla) have
zilla in their name (or knows who Tom Paquin is).


Re: URI Tests and Japanese Chars (solved)

2005-03-18 Thread Alan Premselaar
List Mail User wrote:
(B...
(BTo: "Daryl C. W. O'Shea" [EMAIL PROTECTED]
(BCc: List Mail User [EMAIL PROTECTED], [EMAIL PROTECTED],
(B   users@spamassassin.apache.org
(BSubject: Re: URI Tests and Japanese Chars (solved) 
(BIn-Reply-To: [EMAIL PROTECTED] 
(BFrom: [EMAIL PROTECTED] (Justin Mason)
(B
(B 
(B   Justin,
(B 
(B 
(BDaryl C. W. O'Shea writes:
(B
(BList Mail User wrote:
(B
(BJeff,
(B
(BRFC 1630 make pretty clear that a email address in either a "mailto:"
(Bor "cid:" clause *is* a URI.  It does not address whether a bare email 
(Baddress
(Bwould count (it seems that it doesn't fit the RFC definition, but does fit
(Bsome other I found by Goggle).
(B
(BI could be convinced either way from a bare address (as it stand now,
(Bmaybe someone else has something to add).  But a "mailto:" "mail:" or "cid:"
(Bclause should (in my opinion) be looked up by the URI rules - they are URI,
(Bnot URL rules (though URLs are clearly the most common from of URIs).
(B
(BI was surprised to see that from the RFC, even "Msg-Id:" clauses
(Bare URIs.
(B
(BPaul Shupak
(B[EMAIL PROTECTED]
(B
(BI'd agree with Paul, what's the difference between doing the lookup of 
(Bthe domain listed in a mailto: link and a http: link -- both of which 
(Bare often found in someone's signature?
(B
(BEliminating the mailto: domain lookup could lead to spam such as "email 
(Bus at [EMAIL PROTECTED] for all the junk you don't really want".
(B
(BHowever, it's an impedance mismatch between what's going into the backends
(B(the SBL and SURBL uribls) and what we're matching on the other end.
(B
(BAt least for SBL, it's definitely problematic, since a SBL escalation
(B(of mail relays) will blocklist mail that *mentions* that domain!
(B 
(B 
(B   Thats not true in general.  Since the SBL is an IP based list,
(B a mail server escalation would have no effect on any other domain, only
(B on messages relayed through the servers.
(B 
(B   The more common case where a SBL escalation will affect other domains
(B is (the typical kind I've noticed) when they list all corporate servers and
(B some otherwise innocent domains use name servers within that space (this was
(B the Russian government/Rostelecom earlier this week).
(B 
(B   Still, you are correct, there is a big difference between the SURBL
(B policy of zero FPs and the SBL policy, which I can best state as "kill the
(B spammers".  SURBLs rarely have `collateral' damage and their default scores
(B reflect that;  The URIBL_SBL is only assigned scores of "0 0.629 0 0.996"
(B in 3.0.2 - Only URIBL_AB_SURBL with set 3 and URIBL_WS_SURBL with set 1 are
(B ever assigned lower scores than the URIBL_SBL.  All the other SURBL have
(B significantly higher scores - URIBL_SC_SURBL is many times what URIBL_SBL is.
(B (You may not know, but I even proposed adding back the SPEWS lists, though
(B with low scores, and I do use all the rfci lists with relatively low scores
(B except for bogusmx, which may be the best single indicator I have ever found,
(B and I still assign it fewer points than URIBL_SC_SURBL).
(B 
(B- --j.
(B{snipped PGP SIGNATURE]
(B 
(B 
(B   Paul Shupak
(B   [EMAIL PROTECTED]
(B 
(B P.S. I understand the political problems with the particular FPs that SPEWS
(B generates, but I do hope the rfci lists make it to the URIBL rulesets.
(B
(B
(BSince you mentioned the scores, please note the Bobby Rose, the original
(Bposter of this issue had modified the score for URIBL_SBL from its
(Bdefaults to 10 ...
(B
(BI had suggested that he reduce the score (possibly setting it back to
(Bthe defaults)
(B
(BWhile it doesn't negate the issues surrounding the way the URI lookups
(Bwork (or should possibly work) ... it's obvious that there is enough FP
(Bpotential to warrant not scoring it so high.
(B
(Balan

Re: URI Tests and Japanese Chars (solved)

2005-03-18 Thread List Mail User
[all sipped]


Since you mentioned the scores, please note the Bobby Rose, the original
poster of this issue had modified the score for URIBL_SBL from its
defaults to 10 ...

I had suggested that he reduce the score (possibly setting it back to
the defaults)

While it doesn't negate the issues surrounding the way the URI lookups
work (or should possibly work) ... it's obvious that there is enough FP
potential to warrant not scoring it so high.

alan

I think you are quite correct.  If you want to have a high weight
on the SBL, use it as a RBL at the SMTP level (I do).  I think its score
once a message hits SA is already correct given the extreme overlap with
other hit rules (I have lots of filtering before that - SA is my last line
of defense and seems almost impenetrable).  Even my own local rules generally
have very low scores - only two score above 1.5 and only 5 score above .6,
out of about 25 local rules.  As best I can tell, the default scoring is
very well adjusted already.

Paul Shupak
[EMAIL PROTECTED]


Re: Is this Received header correctly formatted?

2005-03-18 Thread Eric A. Hall
mouss wrote:
Eric A. Hall wrote:
Huh? The helo= stuff is inside the parenthesis. Perhaps I am missing
something but your point 3 seems to conflicewith your point 2.
comments are only allowed where whitespace occurs
can you give you me the line num in the rfc?
It's actually somewhat stricter than that, and actually says that 
comments can only be used where folding would occur (that's a 
hyper-techinical but accurate reading; see the robustness principle).

Here is what rfc2822 says:
3.2.3. Folding white space and comments
 [...]
   There are several places in this standard where comments and FWS may
   be freely inserted.  To accommodate that syntax, an additional token
   for CFWS is defined for places where comments and/or FWS can occur.
   However, where CFWS occurs in this standard, it MUST NOT be inserted
   in such a way that any line of a folded header field is made up
   entirely of WSP characters and nothing else.
FWS =   ([*WSP CRLF] 1*WSP) /   ; Folding white space
obs-FWS
ctext   =   NO-WS-CTL / ; Non white space controls
%d33-39 /   ; The rest of the US-ASCII
%d42-91 /   ;  characters not including (,
%d93-126;  ), or \
ccontent=   ctext / quoted-pair / comment
comment =   ( *([FWS] ccontent) [FWS] )
CFWS=   *([FWS] comment) (([FWS] comment) / FWS)
   Throughout this standard, where FWS (the folding white space token)
   appears, it indicates a place where header folding, as discussed in
   section 2.2.3, may take place.  Wherever header folding appears in a
   message (that is, a header field body containing a CRLF followed by
   any WSP), header unfolding (removal of the CRLF) is performed before
   any further lexical analysis is performed on that header field
   according to this standard.  That is to say, any CRLF that appears in
   FWS is semantically invisible.
   A comment is normally used in a structured field body to provide some
   human readable informational text.  Since a comment is allowed to
   contain FWS, folding is permitted within the comment.  Also note that
   since quoted-pair is allowed in a comment, the parentheses and
   backslash characters may appear in a comment so long as they appear
   as a quoted-pair.  Semantically, the enclosing parentheses are not
   part of the comment; the comment is what is contained between the two
   parentheses.  As stated earlier, the \ in any quoted-pair and the
   CRLF in any FWS that appears within the comment are semantically
   invisible and therefore not part of the comment either.
   Runs of FWS, comment or CFWS that occur between lexical tokens in a
   structured field header are semantically interpreted as a single
   space character.
RFC 2822 is slightly stricter than RFC 822 in this regard. And while 
it's not full standard like 822, it is a standards-track update to 822 
and was sanctioned by the IESG as such, and was developed after years of 
debate over good and bad behavior.

and even then, the original thing was:
Received: from ar39.lsanca2-4.16.241.28.lsanca2.elnk.dsl.genuity.net
([4.16.241.28] helo=watson1)
and here helo=watson1 is inside parens, and with withespace (before and 
after the parens). or am I missing something?
Check the BNF again.
--
Eric A. Hall  http://www.ehsco.com/
Internet Core Protocolshttp://www.oreilly.com/catalog/coreprot/


OT: SURBL usage for content-filters like SquidGuard?

2005-03-18 Thread Jason Haar
Hi there
I was wondering if anyone has written a Squid/proxy redirector filter 
that uses SURBL? It would seem to me the URLs referenced by SURBL are 
Web sites I'd never want to go to? :-)

Maybe it would be only usable via an rsync feed (i.e text file), but the 
data quality should be pretty good...

--
Cheers
Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +64 3 9635 377 Fax: +64 3 9635 417
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1


Re: sa-learn help

2005-03-18 Thread Mike Jackson
The problem is that when they forward the email you will loose the
headers and it will think they are the spam/hammers.
No, he said they're forwarding them *as attachments*. All you need to do is 
take the attachments out of the email, and voila, email as the receiver 
received it.

Mike Jackson 



Please help with subject rule

2005-03-18 Thread Roman Serbski
Dear all,

Could you please help me with one SA subject rule that sometimes works
and sometimes doesn't.

SpamAssassin 3.0.2 with qmail-scanner 1.25st.

Everything works like a charm but we receive a lot of spam messages
from yahoo.com group with [expoforum_kg] subject.  I created a rule in
20_head_tests.cf to score all messages containing [expoforum_kg] in a
subject.  I know I shouldn't use global cf rules but I was just
testing.

20_head_tests.cf:

header EXPO_SUCKERS Subject =~ /\b(?:[a-z]([-_.
=~\/:,[EMAIL PROTECTED]+;\\'\\])\1{0,2}){4,}/i
describe EXPO_SUCKERS Subject: contains [expoforum_kg]

50_scores.cf:

score EXPO_SUCKERS 10 10.05 10.07 10.09

Now the problem is that sometimes this rule works but sometimes it is
being ignored.

This is an example of successful detection:

Mon, 14 Mar 2005 18:11:21 KGT:40007: from='Neomarketing
[EMAIL PROTECTED]', subj='[expoforum_kg] A D V E R T I S E - TO -
M I L
 L I O N S', via SMTP from 66.94.237.16
Mon, 14 Mar 2005 18:11:23 KGT:40007: uvscan: finished scan in 1.860183 secs
Mon, 14 Mar 2005 18:11:41 KGT:40007: SA: REPORT hits = 10.6/3.5
1.3 GAPPY_SUBJECT Subject: contains G.a.p.p.y-T.e.x.t
10 EXPO_SUCKERS Subject: contains [expoforum_kg]
1.3 DATE_IN_FUTURE_06_12 Date: is 6 to 12 hours after Received: date
0.5 TARGETED BODY: Targeted Traffic / Email Addresses

Mon, 14 Mar 2005 18:11:41 KGT:40007: SA: yup, this smells like SPAM -
hits=10.6 - rejecting message...
Mon, 14 Mar 2005 18:11:41 KGT:40007: SA: finished scan in 17.88551
secs - hits=10.6
Mon, 14 Mar 2005 18:11:41 KGT:40007: r_e: X-Qmail-Scanner-1.25st: We
have reasons to believe this mail is SPAM

This is an example of unsuccessful detection:

Tue, 15 Mar 2005 18:28:48 KGT:17412: from='Jodi Chu
[EMAIL PROTECTED]', subj='[expoforum_kg] Paid ontime 50%
profit', via SMTP from 66.94.237.41
Tue, 15 Mar 2005 18:28:50 KGT:17412: uvscan: finished scan in 1.859957 secs
Tue, 15 Mar 2005 18:29:06 KGT:17412: SA: REPORT hits = 0.4/3.5
1.0 RATWARE_HASH_2_V2 Bulk email fingerprint (hash 2 v2) found
0.1 TO_EMPTY To: is empty
0.0 RATWARE_HASH_2 Bulk email fingerprint (hash 2) found
0.1 EXCUSE_3 BODY: Claims you can be removed from the list
0.0 EXCUSE_7 BODY: Claims you can be removed from the list
0.3 EXCUSE_REMOVE BODY: Talks about how to be removed from mailings
1.5 URIBL_WS_SURBL Contains an URL listed in the WS SURBL blocklist
[URIs: idv.st]
0.0 MISSING_MIMEOLE Message has X-MSMail-Priority, but no X-MimeOLE

Tue, 15 Mar 2005 18:29:06 KGT:17412: SA: required_hits 3.5 /
sa_quarantine +2.1 / sa_delete +4.2
Tue, 15 Mar 2005 18:29:06 KGT:17412: SA: finished scan in 16.069264
secs - hits=0.4

Any ideas would be greatly appreciated.

Thank you.
Roman


Re: bayes test

2005-03-18 Thread crisppy fernandes
 I wonder how I have to train spamassassin to get bayes_XX test start
 working.
 
 I have a rule that trains the bayessian filter with each email y
 received with the sa-learn tool.

You have not mentioned that rule and file in which you have written
that rule. if you can tell then it will help others to reply better.
anyway let me try to explain

bayes_XX wrks purely on basis of probability. It tries to find out
tokens in the mail which
match to  earlier learned tokens. Its always better that bayes rules
should learn themselves.
but we can always create rules to enhance the chances of that rule
appear with other tests.
they have their default score which you can check in files:
/usr/share/spamassassin/* directory.
and user created rules you can write in either
/etc/mail/spamassassin/local.cf or user specific file in its home
directory user_prefs file.
Any rule you write or scores you change do not forget to run the command 
spamassassin --lint
and for debugging you can add -D option.


 After some months of training (I
 thought I needed 200 of spam and 200 of ham) I haven't seen it yet.
 The last spam my spamassassin caught it had these tests:

yes its mentioned in spamassassin wiki documentation but reality is
much more than this.
Read man sa-learn , that will help you in understanding the process better.

For further queries mail to the list.
-- 
Crisppy Fernandes


Is spamassassin 3.0.2 wrked for any one just after install or upgrade

2005-03-18 Thread crisppy fernandes
Dev community,

This is to know from developers community is spamassassin wrked for
anyone just after upgrade or install.
Everyday one or other new user complaints abt this behaviour that
spamassassin after upgrade to 3.0.x version not seems to wrk.
After checking the man documents or wiki we come to know that , made
it learn 200 spam and ham then it will wrk. But even then it actually
not wrk. Corpus are not exact things to check for validity as per
sa-learn documentation.
Then is there any other easy way. using which a novice can wrk
with spamassassin without any need to bother abt learning and all.
After going through documentation i am able to understand that it
learn automatically on basis of its different rules.
But what about users who dont have big load of spams on their servers.

Simply here i want to point out is spamassassin.org should provide any
procedure which will make users wrk easy and they feel happy using
this s/w.

-/Crisppyf


Re: OT: SURBL usage for content-filters like SquidGuard?

2005-03-18 Thread Jeff Chan
On Thursday, March 17, 2005, 7:13:32 PM, Jason Haar wrote:
 I was wondering if anyone has written a Squid/proxy redirector filter
 that uses SURBL?

Bill Stearns has some instructions for using Squid, Privoxy and
other programs with sa-blacklist, which is the data source that
goes into ws.surbl.org, at:

  http://www.stearns.org/sa-blacklist/README.howtouse.html

 It would seem to me the URLs referenced by SURBL are
 Web sites I'd never want to go to? :-)

Perhaps, though we would probably not want to make that decision
in a shared or public environment.  Bear in mind that the SURBL
data is strongly biased towards URIs that appear in spam.  While
it's true that most people would probably not want to visit spam
sites, they could be useful for spam research, etc.

 Maybe it would be only usable via an rsync feed (i.e text file), but the 
 data quality should be pretty good...

Bill allows web grabs of sa-blacklist, but SURBLs are usually
used though DNS query or rsync only for high volume mail servers.

You may want to discuss this further on the SURBL discussion list:

  http://lists.surbl.org/

Cheers,

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Bayes DB does not grow anymore

2005-03-18 Thread GRP Productions
Thanks for the offer. You can send it to the email address I use for this 
list,
or you could just send me an FTP URL for retrieval.
Sorry I did not find the time to do this, but I will try to send it during 
the weekend.

Oh, yes. You need to have SURBL switched on via the init.pre (I think it's 
off
by default) and you should use custom rules. I use a set of carefully 
chosen
rulesets mostly from SARE and updated via rulesdujour and some more rules 
of my
own accumulated over time.
It seems SURBL is now enabled by default. It has also changed its name to 
URIDNSBL :-) I do not use SARE rules (although I am trying to find time to 
look at them, as I am aware of their credibility). I use Gray's rules 
(http://files.grayonline.id.au), they seem quite efficient.

I think on a heavy traffic machine it's preferrable to have it off, 
especially
when using MailScanner. Otherwise the expiry can kick in at random times 
every
few hours (you can set a minimum time, though, f.i. one day). Some people 
run a
scheduled expiry three times a day. That's an advice which often comes up 
on
the Mailscanner list (which is a very helpful list, btw).
Depends on how often you need it (whether it reaches the limit you want to 
hold
more often or not). Starting with one expiry per night should be fine, but 
you
should occasionally expire manually and look at the output, in case there 
are
problems.

No. One should get rid of really old tokens, they are only ballast in the 
db.
I don't know how a big db behaves on a busy site. Ours contain 1 Mio. 
tokens
and have a size of 40 MB. They work very well with no ressource hogging. 
But I
have only a few thousand messages running thru each of our servers, there's
probably none which gets more than 10.000 a day. If you get 100.000 it may 
be
different.
I understand what you say. The point is, what should be the criteria to 
understand if the time for an expiration has come? I mean, supposing we take 
only the size in consideration, could be a problem. What if some old tokens 
are still common nowadays in spam mail? You could say it doesn't matter it 
will be started again and recognize all the bad stuff. In that sense, we 
could just stop maintaining Bayes completely.

That's what we do. I only learn messages which were categorized wrong. Not 
by
Bayes, but by SA. Most messages which get a score lower than 5 still get a
BAYES_99 which means that Bayes identifies them all. Nevertheless, I learn
these messages because they are spam and it reassures Bayes that they are 
spam.
BTW: I have set BAYES_99 to 3.0, because it's so accurate for us.
As I told you, since my last post I have reset everything.  It seems to me 
it works fine, and it learns rapidly. It gives me no reason not to trust it, 
in a degree I have set my SA score to be more or less equal with the 
BAYES_99 score (around 8). Of course I keep doing mistake-based learning, 
but most of the times I feed it with 'subjective' spam mail (ie. mail that 
my users don't want to receive, but is definitely not spam). I monitor it 
constantly and I am happy about it.

No problem :-) I tend to be a bit snappy on first messages which look to me
like the author could have done a bit more research, but once we are over 
that
stage I hope I can give some good advice based on my experience.
I have to admit that our communication was valuable to me, I learned so much 
about how the whole thing works. Once again, I appreciate it.

Greg
_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



Re: bayes test

2005-03-18 Thread aktor
Hi,

El Fri, 18 Mar 2005 10:54:08 +0530
crisppy fernandes escribió:

 You have not mentioned that rule and file in which you have written
 that rule. if you can tell then it will help others to reply better.
 anyway let me try to explain

I haven't written any rule by myself. I thought it should start
learning by itself.

Both files 

/etc/spamassassin/local.cf
~/.spamassassin

don't hace any directive as I use amavisd-new default settings

  After some months of training (I
  thought I needed 200 of spam and 200 of ham) I haven't seen it yet.
  The last spam my spamassassin caught it had these tests:
 
 yes its mentioned in spamassassin wiki documentation but reality is
 much more than this.

Ok. That's gonna be the problem. Which is the real number of emails
needed to start the bayessian filter to work?

Thx,

aktor
-- 
Compre un MODEM, navegue en Internet: gane amigos y pierda a su mujer.
-- Www.frases.com. 

This mail is copyleft-ed to aktor under the terms of the CC License
(Creative Commons). 


pgpZ02XVmEL4q.pgp
Description: PGP signature


Re: Is this Received header correctly formatted?

2005-03-18 Thread mouss
Eric A. Hall wrote:
Huh? The helo= stuff is inside the parenthesis. Perhaps I am missing
something but your point 3 seems to conflicewith your point 2.

comments are only allowed where whitespace occurs
can you give you me the line num in the rfc?
and even then, the original thing was:
Received: from ar39.lsanca2-4.16.241.28.lsanca2.elnk.dsl.genuity.net
([4.16.241.28] helo=watson1)
and here helo=watson1 is inside parens, and with withespace (before and 
after the parens). or am I missing something?

regards,
mouss


Re: Is this Received header correctly formatted?

2005-03-18 Thread mouss
List Mail User wrote:
...
Date: Thu, 17 Mar 2005 00:29:43 +0100
From: mouss [EMAIL PROTECTED]
...
To: List Mail User [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED],
  [EMAIL PROTECTED]
Subject: Re: Is this Received header correctly formatted?
...
List Mail User wrote:

In other words, lowercase is conformant. and your first point is
not correct (though all the examples do show uppercase).  However, you are
completely correct that the helo= is flat out wrong,
why? it's inside a comment, no?
but with a slight
variation, and it becomes something like (watson1 [4.16.241.28]) which
is not only conformant, but is the the typical behavior or both sendmail
and postfix.
except that here the situation is reversed.
while postfix and sendmail use from heloname (client_namer 
[client_ip]), others such as qmail prefer from client_name 
([client_ip]) (helo heloname) or other variants.


Mous,
You're correct about the reversal, I realized that *after* I sent
the message.  Also technically the area after the [client_ip] is not white
space.  Eric properly pointed out that the entire header field already has
an assigned use already, and the comment in the definition states
specifically not to use information from the HELO.
To requote:
TCP-info = Address-literal / ( Domain FWS Address-literal )
  ; Information derived by server from TCP connection
  ; not client EHLO.
that says what should be inside, not in comments. or are you meaning 
that qmail's:
	Received: (from the network ...); ...
is illegal?
you might, but you'd better come with real arguments.



Notice the definition does not use any specification for white space after
the address literal.  The single space character does not count - The
notation uses that to delineate between atoms and/or tokens; There would have
to be a reference to either FWS, WSP or maybe even LWSP might qualify;
But since none of those atoms are part of the definition, the area after the
literal and before the ')' does not qualify as white space.  So the clause
([4.16.241.28] helo=watson1) seems to be clearly non-conformant.  Also, the
inclusion of the parenthesis seems to be incorrect for a bare literal; They
are only specified for the second alternative with both the Domain and
Address-literals.  I do agree that is it not enough of an error that mail
should be refused on that basis alone, but if a server were to do so, it
would be within its prerogative (and seemingly legal to do so).
Paul Shupak
[EMAIL PROTECTED]



Re: bayes test

2005-03-18 Thread aktor
Hi again,

El Fri, 18 Mar 2005 10:54:08 +0530
crisppy fernandes escribió:

 Any rule you write or scores you change do not forget to run the
 command  spamassassin --lint
 and for debugging you can add -D option.

AsteriX root # amavisd-new debug-sa
[..]
debug: bayes: 20621 tie-ing to DB file R/O
/var/lib/amavis/.spamassassin/bayes_toks debug: bayes: 20621 tie-ing to
DB file R/O /var/lib/amavis/.spamassassin/bayes_seen debug: bayes: found
bayes db version 3 debug: bayes: Not available for scanning, only 49
spam(s) in Bayes DB  200 debug: bayes: 20621 untie-ing
debug: bayes: 20621 untie-ing db_toks
debug: bayes: 20621 untie-ing db_seen
debug: Score set 0 chosen.

I've got this architecture..

postfix - amavisd-new - postfix - maildrop - sa-learn - mailbox
   |  |
   V  V
   clamav   spamassassin

So I would like to load per user bayes_toks and bayes_seen files.

I think my problem is that the only file used by spamassasssin is
/var/lib/amavis/.spamassassin/bayes_* and no per user ones

AsteriX root # sa-learn --dump magic --dbpath
/var/lib/amavis/.spamassassin/ 

0.0000  3 0  non-token data: bayes db version 
0.0000 49 0  non-token data: nspam 
   ^^
0.0000   5240 0  non-token data: nham 
0.0000 164819 0  non-token data: ntokens 
0.0000 1106523114 0  non-token data: oldest atime 
0.0000 139568 0  non-token data: newest atime 
0.0000 1106526477 0  non-token data: last journal sync atime  
0.0000 123833 0  non-token data: last expiry atime 
0.0000  0 0  non-token data: last expire atime delta
0.0000  0 0  non-token data: last expire reduction
count


[EMAIL PROTECTED] aktor $ sa-learn --dump magic
0.0000  3 0  non-token data: bayes db version
0.0000572 0  non-token data: nspam
  ^^^
0.0000   1996 0  non-token data: nham
0.0000 203323 0  non-token data: ntokens
0.0000 1086896787 0  non-token data: oldest atime
0.0000 127201 0  non-token data: newest atime
0.0000  0 0  non-token data: last journal sync atime 
0.0000 102285 0  non-token data: last expiry atime 
0.0000   29436939 0  non-token data: last expire atime delta 
0.0000  0 0  non-token data: last expire reduction cou

Is there any way to solve this?

Thx,

aktor
-- 
El hombre todavía puede apagar el ordenador. Sin embargo, tendremos que
esforzarnos mucho para conservar este privilegio.
-- J. Weizembaum. Sociólogo norteamericano experto en
ordenadores. 

This mail is copyleft-ed to aktor under the terms of the CC License
(Creative Commons). 


pgptNqbVQ76Uf.pgp
Description: PGP signature


Re: Is this Received header correctly formatted?

2005-03-18 Thread jdow
From: mouss [EMAIL PROTECTED]

 Eric A. Hall wrote:
 
 
  Huh? The helo= stuff is inside the parenthesis. Perhaps I am missing
  something but your point 3 seems to conflicewith your point 2.
  
  
  comments are only allowed where whitespace occurs
  
 
 can you give you me the line num in the rfc?
 
 and even then, the original thing was:
 Received: from ar39.lsanca2-4.16.241.28.lsanca2.elnk.dsl.genuity.net
 ([4.16.241.28] helo=watson1)
 and here helo=watson1 is inside parens, and with withespace (before and 
 after the parens). or am I missing something?

It IS Microsoft. I know that for certain. That machine is sitting about
10' to the East of me at this moment. My Received: header is will be
a similar format with kittycat as the helo. These are the computer
names on the local network isolated from the outside network by a Linux
firewall.

I am *NOT* about to rename these machines by the incomprehensible,
impossible to type from memory, and changeable name assigned to the
firewall interface.

I do NOT run a mail server for sending mail to the Internet on the
firewall machine. I do not, at this time, intend to. If we get a static
IP I might consider firing up a suitably screwed down Postfix for direct
incoming and outgoing email rather than the fetchmail configuration in
use at the moment.

While I fully realize that Microsoft is well known to embrace and
extend otherwise known as screw-up common standards for their own
incomprehensible reasons. (Most often it's probably some jerk genius
programming it who might declare, Gee, I didn't think of that! An
example of that is the means by which I, were I a malware author,
could render your machine mysteriously unbootable in anything but
safe-mode simply because Microsoft did not think of the consequences
of a change they put into SP2. A product I make happened to trigger
this defect. I had to find a way around it.) Anyway, the point of
this is that denying that format will deny a very large proportion
of mail that is from Outlook Express users.

Personally, I don't give a fleeking furglemonk whether you do or not.
I'm simply telling you what the facts of the situation are so that
you can make your own determination whether you want to block email
from a VERY large segment of the legitimate email crossing the net
today. Then you can take responsibility for lost or rejected email
for yourself. (If you have customers involved be aware this may
constitute a liability situation for you personally and your
company.)

{^_^}   Joanne
PS: The actual firewall machine is imaginatively named it.
If you dig in the headers enough maybe you can even figure out
the internal network particulars. It is NOT going to change
because somebody is needlessly particular about header formats.



Re: Is this Received header correctly formatted?

2005-03-18 Thread mouss
List Mail User wrote:
...
Date: Thu, 17 Mar 2005 00:29:43 +0100
From: mouss [EMAIL PROTECTED]
...
To: List Mail User [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED],
  [EMAIL PROTECTED]
Subject: Re: Is this Received header correctly formatted?
...
List Mail User wrote:

In other words, lowercase is conformant. and your first point is
not correct (though all the examples do show uppercase).  However, you are
completely correct that the helo= is flat out wrong,
why? it's inside a comment, no?
but with a slight
variation, and it becomes something like (watson1 [4.16.241.28]) which
is not only conformant, but is the the typical behavior or both sendmail
and postfix.
except that here the situation is reversed.
while postfix and sendmail use from heloname (client_namer 
[client_ip]), others such as qmail prefer from client_name 
([client_ip]) (helo heloname) or other variants.


Mous,
You're correct about the reversal, I realized that *after* I sent
the message.  Also technically the area after the [client_ip] is not white
space.  Eric properly pointed out that the entire header field already has
an assigned use already, and the comment in the definition states
specifically not to use information from the HELO.
To requote:
TCP-info = Address-literal / ( Domain FWS Address-literal )
  ; Information derived by server from TCP connection
  ; not client EHLO.
Notice the definition does not use any specification for white space after
the address literal.  The single space character does not count - The
notation uses that to delineate between atoms and/or tokens; There would have
to be a reference to either FWS, WSP or maybe even LWSP might qualify;
But since none of those atoms are part of the definition, the area after the
literal and before the ')' does not qualify as white space.  So the clause
([4.16.241.28] helo=watson1) seems to be clearly non-conformant. 
ahem. the specs provide for comments, and don't restrict comments. so 
whatever is in between pars is ok. the specs even allow silly things 
linke Fr(foo)om. btw, unlike what a lot of people seem to think, rfc2821 
is only a standard track'.

 Also, the
inclusion of the parenthesis seems to be incorrect for a bare literal; 
as far as this is in comments, there is no issue.  so
Receieved: from foo (whatever is here)
is ok.
They
are only specified for the second alternative with both the Domain and
Address-literals.  I do agree that is it not enough of an error that mail
should be refused on that basis alone, but if a server were to do so, it
would be within its prerogative (and seemingly legal to do so).
as far as I can see, the std allows for a lot of received stuff. the std 
even manages to create a notion of domain that is not compatible with a 
dns domain. after all, smtp has apparently been defined by sendmail



Time in the log file is incorrect?

2005-03-18 Thread David Suen
Hi all,

I just read my spamd log file and I found that the time in the log is
incorrect. I just sent an email to myself and here is the log:
@4000423abce8189d6a1c 2005-03-18 11:34:54 [21095] snip

whereas right now the time is Fri Mar 18 22:36:42 EST 2005.

I have ntp installed and should not be the problem. Do guys know they
reason why incorrect?

Although it is not a big issue, it may cause the problem with my log
analyzer.

Thanks
David



Spammers Target Secondary MX hosts?

2005-03-18 Thread Yang Xiao
Hi all,
I've been noticing it lately that almost 90% of emails come in through
our secondary MX host are spams, I just want to know if there's an
explanation for this, my guess is that the spammers spam the secondary
MX host intentionally for some reason I can't understand, maybe hoping
the secondary host will configured with less care?

Many thanks,

Yang


Re: Spammers Target Secondary MX hosts?

2005-03-18 Thread Martin Hepworth
I think the reason is that they think we might trust the secondary MX 
more than anything else and therefore let it through without checks.

--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300
Yang Xiao wrote:
Hi all,
I've been noticing it lately that almost 90% of emails come in through
our secondary MX host are spams, I just want to know if there's an
explanation for this, my guess is that the spammers spam the secondary
MX host intentionally for some reason I can't understand, maybe hoping
the secondary host will configured with less care?
Many thanks,
Yang
**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.   
**


Network Tests

2005-03-18 Thread Daniel A. de Araujo



Hi 
guys,

I have the Spam 
Assassin 2.63 with Amavis installed in my box and now I am trying to enable 
network tests with SpamcopURI. 
Its working but the 
delivery of the messages isvery slow when network tests are enabled, so 
I´d to disable it.

Any ideas to make 
thedeliver of messages faster with network tests enabled 
?

Thanks a 
lot,
Daniel.


Esta mensagem eletronica (e qualquer anexo) e confidencial e enderecada ao(s) 
individuo(s) referidos acima e a outros que tenham sido expressamente 
autorizados a recebe-la.Se voce nao e o destinatario(a) desta mensagem, por 
gentileza nao copie, use ou divulgue seu conteudo. Caso voce tenha recebido 
esta mensagem equivocadamente por favor, apague esta mensagem e eventuais 
copias.

This e-mail communication (and any attachments) is confidential and is intended 
only for the individual(s) named above and others who have been specifically 
authorized to receive it. If you are not the intended recipient, please do not 
read, copy, use or disclose the contents of this communication to others. 
Please then delete the e-mail and any copies of it.

sem acentuacao ...


Re: Spammers Target Secondary MX hosts?

2005-03-18 Thread Yang Xiao
On Fri, 18 Mar 2005 13:48:46 +, Duncan Hill [EMAIL PROTECTED] wrote:
 On Friday 18 March 2005 13:09, Yang Xiao typed:
  Hi all,
  I've been noticing it lately that almost 90% of emails come in through
  our secondary MX host are spams, I just want to know if there's an
  explanation for this, my guess is that the spammers spam the secondary
  MX host intentionally for some reason I can't understand, maybe hoping
  the secondary host will configured with less care?
 
 In a large number of cases, the secondary MX is not configured to know the
 list of valid users etc, and may be configured to pass directly to the
 internal mail server, bypassing protections on the primary relay.

hm...I'd be interested to know what's the percentage is like for this
kind of settings just to feed my curiousity, because it totally
doesn't make sense to me , it's like settings up a secondary firewall
with no blocking rules, what good is it?

Yang


Re: Is this Received header correctly formatted?

2005-03-18 Thread List Mail User
...
Date: Fri, 18 Mar 2005 03:40:20 +0100
From: mouss [EMAIL PROTECTED]
...
Subject: Re: Is this Received header correctly formatted?
...

List Mail User wrote:
...
Date: Thu, 17 Mar 2005 00:29:43 +0100
From: mouss [EMAIL PROTECTED]
...
To: List Mail User [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED],
   [EMAIL PROTECTED]
Subject: Re: Is this Received header correctly formatted?
...

List Mail User wrote:


In other words, lowercase is conformant. and your first point is
not correct (though all the examples do show uppercase).  However, you are
completely correct that the helo= is flat out wrong,

why? it's inside a comment, no?

 but with a slight

variation, and it becomes something like (watson1 [4.16.241.28]) which
is not only conformant, but is the the typical behavior or both sendmail
and postfix.

except that here the situation is reversed.
while postfix and sendmail use from heloname (client_namer 
[client_ip]), others such as qmail prefer from client_name 
([client_ip]) (helo heloname) or other variants.


 
  Mous,
 
  You're correct about the reversal, I realized that *after* I sent
 the message.  Also technically the area after the [client_ip] is not white
 space.  Eric properly pointed out that the entire header field already has
 an assigned use already, and the comment in the definition states
 specifically not to use information from the HELO.
 
 To requote:
 
 TCP-info = Address-literal / ( Domain FWS Address-literal )
   ; Information derived by server from TCP connection
   ; not client EHLO.


that says what should be inside, not in comments. or are you meaning 
that qmail's:
   Received: (from the network ...); ...
is illegal?
you might, but you'd better come with real arguments.

[end of history - start of actual response]

Actually, I have to admit, that without checking I usually just
assume qmail is wrong;)  But in this case the Received: (from the network
(actually, all the examples from qmail a quick check showed, were of the form
(invoked ..., but the argument is the same) is comformant because the format
for a Received: line is defined by:

RFC 2822 Section 3.6.7
...
received=   Received: name-val-list ; date-time CRLF

name-val-list   =   [CFWS] [name-val-pair *(CFWS name-val-pair)]
...

And the CFWS is exactly what Eric pointed to before as the case
where comments are allowed.  What you seem to be missing is that a space
in the BNF is *not* white space, but just a delimiter.  You need to check
what is in RFC 2234, as specifically mentioned in RFC2822 Section 1.2.2.
Whis is white space is always denoted as one of WSP, or LWSP (RFC2234
Sections 4 and 6.1).  RFC2822 Section 3.2.3 introduces FWS and CWSP for
the purposes of that document.  Comments are allowed in headers whenever
CWSP is used in the BNF - The definition a comment (for RFC2822) is given
as:

RFC 2822 Section 3.2.3
...
   There are several places in this standard where comments and FWS may
   be freely inserted.  To accommodate that syntax, an additional token
   for CFWS is defined for places where comments and/or FWS can occur.
   However, where CFWS occurs in this standard, it MUST NOT be inserted
   in such a way that any line of a folded header field is made up
   entirely of WSP characters and nothing else.

FWS =   ([*WSP CRLF] 1*WSP) /   ; Folding white space
obs-FWS
...
comment =   ( *([FWS] ccontent) [FWS] )

CFWS=   *([FWS] comment) (([FWS] comment) / FWS)
...

Note that when a comment appears in part of CFWS it is required
to have parenthesis around it - again, so the helo=watson1 clause which
started all of this mess is again, not valid.  It does seem that a line
containing (helo=watson1) [4.16.241.28] would be legal, but would seem
to be violating the spirit of the law which says, (paraphrased) data not
derived from EHLO.  Note, the parenthesis are required around comments (the
BNF specifies them as quoted literal characters as shown above).

Anyway the qmail case would be parsed as a received line with a
perfectly legal comment at the beginning of an otherwise empty name-val-list
and a required date-time at the end.  Certainly not optimal, but legal (I
would want to see a name-val-list containing at least one name-val-pair,
as it is of more interest than the comment of invoked ... in some fashion).

[more thread history below]


 Notice the definition does not use any specification for white space after
 the address literal.  The single space character does not count - The
 notation uses that to delineate between atoms and/or tokens; There would have
 to be a reference to either FWS, WSP or maybe even LWSP might qualify;
 But since none of those atoms are part of the definition, the area after the
 literal and before the ')' does not qualify as white space.  So the clause
 ([4.16.241.28] helo=watson1) seems to be clearly non-conformant.  Also, 

Re: Spammers Target Secondary MX hosts?

2005-03-18 Thread Kai Schaetzl
Yang Xiao wrote on Fri, 18 Mar 2005 08:09:24 -0500:

 I've been noticing it lately that almost 90% of emails come in through 
 our secondary MX host are spams, I just want to know if there's an 
 explanation for this, my guess is that the spammers spam the secondary 
 MX host intentionally for some reason I can't understand, maybe hoping 
 the secondary host will configured with less care?


Yes, that seems to be the idea.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de  http://msie.winware.org





RE: Please help with subject rule

2005-03-18 Thread Bowie Bailey
From: Roman Serbski [mailto:[EMAIL PROTECTED]
 
 Dear all,
 
 Could you please help me with one SA subject rule that sometimes works
 and sometimes doesn't.
 
 SpamAssassin 3.0.2 with qmail-scanner 1.25st.
 
 Everything works like a charm but we receive a lot of spam messages
 from yahoo.com group with [expoforum_kg] subject.  I created a rule in
 20_head_tests.cf to score all messages containing [expoforum_kg] in a
 subject.  I know I shouldn't use global cf rules but I was just
 testing.
 
 20_head_tests.cf:
 
 header EXPO_SUCKERS Subject =~ /\b(?:[a-z]([-_.
 =~\/:,[EMAIL PROTECTED]+;\\'\\])\1{0,2}){4,}/i
 describe EXPO_SUCKERS Subject: contains [expoforum_kg]
 
 
 This is an example of successful detection:
 
 subj='[expoforum_kg] A D V E R T I S E - TO - M I L L I O N S'
 
 This is an example of unsuccessful detection:
 
 subj='[expoforum_kg] Paid ontime 50% profit'

The problem is that your rule is matching the expanded text seen in the
first subject rather than the '[expoforum_kg]' that you seem to expect.  Try
this rule instead:

header EXPO_SUCKERS Subject =~ /\b\[expoforum_kg\]\b/i

Bowie


Re: Please help with subject rule

2005-03-18 Thread Evan Platt
At 08:58 PM 3/17/2005, you wrote:
Everything works like a charm but we receive a lot of spam messages
from yahoo.com group with [expoforum_kg] subject.  I created a rule in
20_head_tests.cf to score all messages containing [expoforum_kg] in a
subject.  I know I shouldn't use global cf rules but I was just
testing.
Unless I'm missing the point... [EMAIL PROTECTED] 
would be a much better solution. :)

Evan  



Gray's rules?

2005-03-18 Thread Bowie Bailey
I just came across a mention of these rules in another post.  I am already
using quite a few of the SARE rules and am wondering whether it would be
useful to add these to my server.  Has anyone done any mass checks on these
rules?  If they will increase spam detection, I'd love to add them in, but I
don't want to significantly increase my false positive rate (which is near
zero at the moment).

Gray's rules 
(http://files.grayonline.id.au)

Thanks,
Bowie


Re: Whitelist Question

2005-03-18 Thread Bryan Haase
I am not sure if there is a whitelist_subject, an allow rule would accomplish 
the same thing 

headerSUBJ_ALLOW_RULE_1  Subject =~ /words go here/i
describe  SUBJ_ALLOW_RULE_1  Subject ALLOW Rule for words go here
score  SUBJ_ALLOW_RULE_1  -15.0

--Bryan


 Timothy Richter [EMAIL PROTECTED] 03/17/05 03:51PM 
Good Afternoon,

I have made whitelist_from exceptions and whitelist_to exceptions.  Is it 
possible to make a exception in the whitelist file by subject?  I am guessing 
it would be whitelist_subject .

Thanks,

Tim

-

This email transmission and any documents, files or previous

email messages attached to it may contain information that is

confidential or legally privileged. If you are not the intended

recipient, you are hereby notified that any disclosure, copying,

printing, distributing or use of this transmission is strictly

prohibited. If you have received this transmission in error,

please immediately notify the sender by telephone or return

email and delete the original transmission and its attachments

without reading or saving in any manner.



The Evangelical Lutheran Good Samaritan Society.

-


Re: Spammers Target Secondary MX hosts?

2005-03-18 Thread Kurt Boyack
A secondary MX host will get mostly spam. Mailers that follow the
rules will use the MX records as they were intended. Spammers scan all
hosts for port 25 and send email through them any way they can. You
can put a machine on the Internet without any MX records and spam will
start flowing through it. It usually does not take them very long to
discover a mail server.

The upside is that the spam can be used for testing new versions of
SpamAssassin. :)


On Fri, 18 Mar 2005 08:09:24 -0500, Yang Xiao [EMAIL PROTECTED] wrote:
 Hi all,
 I've been noticing it lately that almost 90% of emails come in through
 our secondary MX host are spams, I just want to know if there's an
 explanation for this, my guess is that the spammers spam the secondary
 MX host intentionally for some reason I can't understand, maybe hoping
 the secondary host will configured with less care?
 
 Many thanks,
 
 Yang



Re: Is this Received header correctly formatted?

2005-03-18 Thread List Mail User
...
Date: Thu, 17 Mar 2005 00:29:43 +0100
From: mouss [EMAIL PROTECTED]
...
To: List Mail User [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED],
   [EMAIL PROTECTED]
Subject: Re: Is this Received header correctly formatted?
...

List Mail User wrote:


In other words, lowercase is conformant. and your first point is
not correct (though all the examples do show uppercase).  However, you are
completely correct that the helo= is flat out wrong,

why? it's inside a comment, no?

 but with a slight

variation, and it becomes something like (watson1 [4.16.241.28]) which
is not only conformant, but is the the typical behavior or both sendmail
and postfix.

except that here the situation is reversed.
while postfix and sendmail use from heloname (client_namer 
[client_ip]), others such as qmail prefer from client_name 
([client_ip]) (helo heloname) or other variants.


 
  Mous,
 
  You're correct about the reversal, I realized that *after* I sent
 the message.  Also technically the area after the [client_ip] is not white
 space.  Eric properly pointed out that the entire header field already has
 an assigned use already, and the comment in the definition states
 specifically not to use information from the HELO.
 
 To requote:
 
 TCP-info = Address-literal / ( Domain FWS Address-literal )
   ; Information derived by server from TCP connection
   ; not client EHLO.
 
 Notice the definition does not use any specification for white space after
 the address literal.  The single space character does not count - The
 notation uses that to delineate between atoms and/or tokens; There would have
 to be a reference to either FWS, WSP or maybe even LWSP might qualify;
 But since none of those atoms are part of the definition, the area after the
 literal and before the ')' does not qualify as white space.  So the clause
 ([4.16.241.28] helo=watson1) seems to be clearly non-conformant. 

ahem. the specs provide for comments, and don't restrict comments. so 
whatever is in between pars is ok. the specs even allow silly things 
linke Fr(foo)om. btw, unlike what a lot of people seem to think, rfc2821 
is only a standard track'.

I've made this argument myself, but it has been upgraded to
Best Practices.

Also, your Fr(foo)om case is not allowed, because as you can read
below a comment is to be parsed as if it were a single space character, so
your example would parse to Fr om which is meaningless.

Anyway, let's go back to RFC822 which is a Standard and still stands
depite the intentions for 2822 to replace it.  To quote the `old' restriction
on comments:

RFC822 Section 3.4.3
3.4.3.  COMMENTS

A comment is a set of ASCII characters, which is  enclosed  in
matching  parentheses  and which is not within a quoted-string
The comment construct permits message originators to add  text
which  will  be  useful  for  human readers, but which will be
ignored by the formal semantics.  Comments should be  retained
while  the  message  is subject to interpretation according to
this standard.  However, comments  must  NOT  be  included  in
other  cases,  such  as  during  protocol  exchanges with mail
servers.

Comments nest, so that if an unquoted left parenthesis  occurs
in  a  comment  string,  there  must  also be a matching right
parenthesis.  When a comment acts as the delimiter  between  a
sequence of two lexical symbols, such as two atoms, it is lex-
ically equivalent with a single SPACE,  for  the  purposes  of
regenerating  the  sequence, such as when passing the sequence
onto a mail protocol server.  Comments are  detected  as  such
only within field-bodies of structured fields.

If a comment is to be folded onto multiple lines,  then  the
syntax  for  folding  must  be  adhered to.  (See the Lexical
Analysis of Messages section on Folding Long Header  Fields
above,  and  the  section on Case Independence below.)  Note
that  the  official  semantics  therefore  do  not  see  any
unquoted CRLFs that are in comments, although particular pars-
ing programs may wish to note their presence.  For these  pro-
grams,  it would be reasonable to interpret a CRLF LWSP-char
as being a CRLF that is part of the comment; i.e., the CRLF is
kept  and  the  LWSP-char is discarded.  Quoted CRLFs (i.e., a
backslash followed by a CR followed by a  LF)  still  must  be
followed by at least one LWSP-char.

and

RFC822 Section 3.4.6
3.4.6.  BRACKETING CHARACTERS

There is one type of bracket which must occur in matched pairs
and may have pairs nested within each other:

o   Parentheses (( and )) are used  to  indicate  com-
ments.
...

So even in RFC822, comments require parenthesis.


  Also, the
 inclusion of the 

Re: Time in the log file is incorrect?

2005-03-18 Thread Matt Kettler
At 06:37 AM 3/18/2005, David Suen wrote:
Hi all,
I just read my spamd log file and I found that the time in the log is
incorrect. I just sent an email to myself and here is the log:
@4000423abce8189d6a1c 2005-03-18 11:34:54 [21095] snip
whereas right now the time is Fri Mar 18 22:36:42 EST 2005.
Given that your time zone is GMT +11, and the difference between those two 
times is 11 hours, I'd check the server in question and make sure that 
/etc/localtime is in fact the correct timezone, and not GMT.




DCC License Change

2005-03-18 Thread Thomas Cameron
Has anyone been following the DCC license change thread on the DCC mailing 
list?  Is anyone going to be negatively affected by it?

I run a small mail server for my own small business, so I don't imagine that 
it will affect me.  Does anyone have any opinions on the licensing change?

Thomas 



Re: SPAM/HAM folder

2005-03-18 Thread Steven Dickenson
Norman Zhang wrote:
On my SA Gateway, I have no local box except root. Should I forward
HAM/SPAM to local box? Mail are not meant for local delivery here.
I assume you mean for Bayesian training.  In that case, you can't use 
forwarded mail for that, as Bayesian training depends on having the 
original message intact.  If you try and train on forwarded messages, 
your Bayes database will get real ugly real quick.

We use an Exchange public folder that get's messages dragged to it, and 
a Perl script on the Exim gateway box that grabs messages from the 
public folder via IMAP and trains them.  It's not a perfect system, as 
users have to figure out how to drag and drop the messages into the 
public folder, plus Exchange will strip some headers out and add some of 
its own when you access a message through IMAP, but its better than nothing.

Steven
--
Steven Dickenson [EMAIL PROTECTED]
http://www.mrchuckles.net


Re: Spammers Target Secondary MX hosts?

2005-03-18 Thread Alexander Bochmann
...on Fri, Mar 18, 2005 at 08:52:23AM -0500, Yang Xiao wrote:

  On Fri, 18 Mar 2005 13:48:46 +, Duncan Hill [EMAIL PROTECTED] wrote:
   In a large number of cases, the secondary MX is not configured to know the
   list of valid users etc, and may be configured to pass directly to the
   internal mail server, bypassing protections on the primary relay.
  hm...I'd be interested to know what's the percentage is like for this
  kind of settings just to feed my curiousity, because it totally
  doesn't make sense to me , it's like settings up a secondary firewall
  with no blocking rules, what good is it?

It shurely doesn't make sense if the secondary MX is 
under your control, but there are many setups where 
the ISP or someone else runs a backup MX for his 
customer's domains as a service. With this configuration, 
the secondary MX will usually not know about valid users 
in the destination domain.

Therefore it makes sense for the spammers to deliver 
mail to the secondary MX, as they can always claim 
that 100% of the mails have been successfully delivered.

Alex.



Re: Spammers Target Secondary MX hosts?

2005-03-18 Thread Larry Starr
On Friday 18 March 2005 08:17, Alexander Bochmann wrote:
 ...on Fri, Mar 18, 2005 at 08:52:23AM -0500, Yang Xiao wrote:
   On Fri, 18 Mar 2005 13:48:46 +, Duncan Hill [EMAIL PROTECTED] 
wrote:
In a large number of cases, the secondary MX is not configured to know
the list of valid users etc, and may be configured to pass directly to
the internal mail server, bypassing protections on the primary relay.
  
   hm...I'd be interested to know what's the percentage is like for this
   kind of settings just to feed my curiousity, because it totally
   doesn't make sense to me , it's like settings up a secondary firewall
   with no blocking rules, what good is it?

 It shurely doesn't make sense if the secondary MX is
 under your control, but there are many setups where
 the ISP or someone else runs a backup MX for his
 customer's domains as a service. With this configuration,
 the secondary MX will usually not know about valid users
 in the destination domain.

 Therefore it makes sense for the spammers to deliver
 mail to the secondary MX, as they can always claim
 that 100% of the mails have been successfully delivered.

 Alex.

That, in fact, is the setup that I am operating and, yes, most of what comes 
through my secondary MX, at my ISP, is SPAM.   Some time ago I implemented a 
rule that adds a (small) spam score for mail received via my secondary MX.

-- 
Larry G. Starr - [EMAIL PROTECTED] or [EMAIL PROTECTED]
Software Engineer: Full Compass Systems LTD.
Phone: 608-831-7330 x 1347  FAX: 608-831-6330
===
There are only three sports: bullfighting, mountaineering and motor
racing, all the rest are merely games! - Ernest Hemmingway



hits -,

2005-03-18 Thread Andy Hester
Hello,
My spam filter (postfix/amavisd/sa/clamav) has been working well for 3 +
months now but several days ago I started getting reports of increased
spam getting through.  I checked the mail logs and see some messages are
scored and others are listed with hits -,.  It seems like the
legitimate email is getting scored while the spam is not being scored.
Does this sound familiar to anyone?  Any help would be appreciated
greatly.  I can't figure out what is going on.

Thanks,
Andy


Re: Network Tests

2005-03-18 Thread Matt Kettler
Daniel A. de Araujo wrote:
Hi guys,
I have the Spam Assassin 2.63 with Amavis installed in my box and now I am
trying to enable network tests with SpamcopURI.
Its working but the delivery of the messages is very slow when network tests
are enabled, so I´d to disable it.
Any ideas to make the deliver of messages faster with network tests enabled
?
First a warning: DO NOT run SA 2.63 on a production server. Upgrade to 
2.64 or 3.x because 2.63 has a mime parsing bug that can be used to DoS 
your server.

As for speed:
1) run a caching nameserver on the same box as SA.
2) run a local mirror of some of the RBLs that you can get RSYNC access 
to the zonefiles for.

3) experiment to see which specific network tests are slow by setting 
their score to 0 one at a time.

4) if you use DCC, run a local server if you've got a high volume of 
messages.




Re: Spammers Target Secondary MX hosts?

2005-03-18 Thread Kenneth Porter
--On Friday, March 18, 2005 3:17 PM +0100 Alexander Bochmann 
[EMAIL PROTECTED] wrote:

It shurely doesn't make sense if the secondary MX is
under your control, but there are many setups where
the ISP or someone else runs a backup MX for his
customer's domains as a service. With this configuration,
the secondary MX will usually not know about valid users
in the destination domain.
Therefore it makes sense for the spammers to deliver
mail to the secondary MX, as they can always claim
that 100% of the mails have been successfully delivered.
One possibility is to list your primary again as the tertiary, possibly 
under a different name and/or IP address. Spammers that deliver in reverse 
MX order will still end up trying to deliver to your primary first.

You could also list a bogus server in IP dark space (ie. an address known 
to have no listening server) so that the spammer must first check the empty 
address first. Even better is when there's a host there that drops packets 
(no TCP reset or ICMP port unreachable reply) to port 25, so that the 
spammer must time out the TCP connection attempt.


Re: DCC License Change

2005-03-18 Thread Matt Kettler
Thomas Cameron wrote:
Has anyone been following the DCC license change thread on the DCC 
mailing list?  Is anyone going to be negatively affected by it?

I run a small mail server for my own small business, so I don't 
imagine that it will affect me.  Does anyone have any opinions on the 
licensing change?
Well, I can give you my opinions, but they mean nothing whatsoever.
I think Vernon's opinions matter much more:
http://www.rhyolite.com/pipermail/dcc/2005/002575.html
Basically the primary target is those specifically selling managed 
services and appliances.

In general the date-based archive is here, for reference:
http://www.rhyolite.com/pipermail/dcc/2005/date.html


Re: DCC License Change

2005-03-18 Thread Theo Van Dinter
On Fri, Mar 18, 2005 at 11:54:26AM -0500, Matt Kettler wrote:
 imagine that it will affect me.  Does anyone have any opinions on the 
 licensing change?
 
 Basically the primary target is those specifically selling managed 
 services and appliances.

This was the first I've heard of a license change, but it means that DCC
will have to be disabled by default in SA, for the same reason as Razor.

-- 
Randomly Generated Tagline:
Disk storage does not only come in 3.5-or-5.25-inch squares.  A third
 type of storage medium-the CD-ROM-is spherical.   - PC Novice


pgpsb8CkFljz4.pgp
Description: PGP signature


Re: Spammers Target Secondary MX hosts?

2005-03-18 Thread Kelson
Larry Starr wrote:
On Friday 18 March 2005 08:17, Alexander Bochmann wrote:
there are many setups where
the ISP or someone else runs a backup MX for his
customer's domains as a service. With this configuration,
the secondary MX will usually not know about valid users
in the destination domain.
That, in fact, is the setup that I am operating and, yes, most of what comes 
through my secondary MX, at my ISP, is SPAM.   Some time ago I implemented a 
rule that adds a (small) spam score for mail received via my secondary MX.
I'm on the flip side of that: we provide secondary MX services for some 
of our customers, and I've started adding a small bonus score for mail 
being sent *to* them through our server.  I've also added meta-rules to 
treat certain rules more harshly.

The really annoying thing, from our standpoint, is the backscatter we 
have to process:

1. Spammer sends to secondary MX (us).
2. We filter out some of the more obvious spam (for the most part using
   our regular criteria).
3. We relay what's left to the primary MX.
4. Primary MX rejects mail to nonexistant users and mail that trips
   their own spam filters.
5. We generate DSNs that go to third parties or nonexistant hosts,
   contributing to backscatter and cluttering up our outbound queue.
The backscatter becomes a real problem in the legitimate relay 
situation, because it's basically unavoidable.  If the spam is sent 
directly to you, you can accept it, discard it, or reject it, and it 
stops.  But if you're relaying to someone, and *they* reject it, now you 
have to decide whether to generate a DSN or not.  We've actually set up 
a separate queue for bounces that aren't delivered immediately, so that 
it won't bog down normal mail.

--
Kelson Vibber
SpeedGate Communications www.speed.net


Re: Spammers Target Secondary MX hosts?

2005-03-18 Thread Chr. von Stuckrad
On Fri, Mar 18, 2005 at 10:24:25AM -0800, Kelson wrote:
...
 5. We generate DSNs that go to third parties or nonexistant hosts,
contributing to backscatter and cluttering up our outbound queue.
...
Even worse, the result of bounces sent by _our_ MTA was
being Spamcop-RBLed for hitting spamtraps with those
bounces! So being a secondary MX might even disrupt your
(own) service, and only the second queue you mentioned
might have helped agains that! But we don't have THAT yet.

Stucki  (bounce-annoyed postmaster)

-- 
Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED]\
Freie Universitaet Berlin  |/_*|'stucki' |Tel(days):+49 30 838-75 459|
Fachbereich Mathematik, EDV|\ *|if online|Tel(else):+49 30 77 39 6600|
Arnimallee 2-6/14195 Berlin* * |on IRCnet|Fax(alle):+49 30 838-75454/


RE: Spammers Target Secondary MX hosts?

2005-03-18 Thread Matthew.van.Eerde
Kelson wrote:
 Larry Starr wrote:
 On Friday 18 March 2005 08:17, Alexander Bochmann wrote:
 there are many setups where
 the ISP or someone else runs a backup MX for his
 customer's domains as a service. With this configuration,
 the secondary MX will usually not know about valid users
 in the destination domain.
 
 That, in fact, is the setup that I am operating and, yes, most of
 what comes through my secondary MX, at my ISP, is SPAM.   Some time
 ago I implemented a rule that adds a (small) spam score for mail
 received via my secondary MX. 
 
 I'm on the flip side of that: we provide secondary MX services for
 some of our customers, and I've started adding a small bonus score
 for mail being sent *to* them through our server.  I've also added
 meta-rules to treat certain rules more harshly.
 
 The really annoying thing, from our standpoint, is the backscatter we
 have to process:
 
 1. Spammer sends to secondary MX (us).
 2. We filter out some of the more obvious spam (for the most part
 using our regular criteria).
 3. We relay what's left to the primary MX.
 4. Primary MX rejects mail to nonexistant users and mail that trips
 their own spam filters.
 5. We generate DSNs that go to third parties or nonexistant hosts,
 contributing to backscatter and cluttering up our outbound queue.
 
 The backscatter becomes a real problem in the legitimate relay
 situation, because it's basically unavoidable.  If the spam is sent
 directly to you, you can accept it, discard it, or reject it, and it
 stops.  But if you're relaying to someone, and *they* reject it, now
 you have to decide whether to generate a DSN or not.  We've actually
 set up a separate queue for bounces that aren't delivered
 immediately, so that it won't bog down normal mail.

Two solutions occur to me:
1) Allow a way for the secondary MX to tell whether the primary MX is up - if 
it is, don't accept any connections
2) Allow a way for the secondary MX to tell what email addresses on the primary 
MX are valid (LDAP occurs to me)

Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902
Hispanic Business Inc./HireDiversity.com Software Engineer
perl -emap{y/a-z/l-za-k/;print}shift Jjhi pcdiwtg Ptga wprztg, 


Re: DCC License Change

2005-03-18 Thread Justin Mason
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Theo Van Dinter writes:
 On Fri, Mar 18, 2005 at 11:54:26AM -0500, Matt Kettler wrote:
  imagine that it will affect me.  Does anyone have any opinions on the 
  licensing change?
  
  Basically the primary target is those specifically selling managed 
  services and appliances.
 
 This was the first I've heard of a license change, but it means that DCC
 will have to be disabled by default in SA, for the same reason as Razor.

Well, I guess this gives us a good reason to finally get around to
writing our own hashing subsystem...

- --j.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFCOyMeMJF5cimLx9ARAtanAKCg9JIbo7A5p5jaKjDl65R7JHgn1ACfVeV5
CXhMpjgjkfBoeNhRhsVYv6c=
=k1h/
-END PGP SIGNATURE-



Re: Spammers Target Secondary MX hosts?

2005-03-18 Thread Alexander Bochmann
...on Fri, Mar 18, 2005 at 10:24:25AM -0800, Kelson wrote:

  The backscatter becomes a real problem in the legitimate relay 
  situation, because it's basically unavoidable.  If the spam is sent 
  directly to you, you can accept it, discard it, or reject it, and it 
  stops.  But if you're relaying to someone, and *they* reject it, now you 
  have to decide whether to generate a DSN or not.  We've actually set up 

When I was in that situation, my solution turned out 
to be milter-ahead, http://www.milter.info/milter-ahead/index.shtml
but that won't help you if you're not running sendmail :)

Alex.



Re: Spammers Target Secondary MX hosts?

2005-03-18 Thread Kenneth Porter
--On Friday, March 18, 2005 10:24 AM -0800 Kelson [EMAIL PROTECTED] wrote:
But if you're relaying to someone, and *they* reject it, now you have to
decide whether to generate a DSN or not.
Using MIMEDefang I don't reject for mail relayed from my secondary:
http://www.mimedefang.org/kwiki/index.cgi?CheckForMX


Re: Spammers Target Secondary MX hosts?

2005-03-18 Thread List Mail User
...
| One possibility is to list your primary again as the tertiary, possibly
| under a different name and/or IP address. Spammers that deliver in reverse
| MX order will still end up trying to deliver to your primary first.

I tried this and it resulted in mail loops when one of the servers was down.
I like the suggestion below better.



| You could also list a bogus server in IP dark space (ie. an address
known
| to have no listening server) so that the spammer must first check the
empty
| address first. Even better is when there's a host there that drops packets
| (no TCP reset or ICMP port unreachable reply) to port 25, so that the
| spammer must time out the TCP connection attempt.
|
|

Be very careful if the dark space is not under your control.  Using
a reserved address will get you a rfci listing, using somebody else's address
in the US is fraud (of course IANAL).  If you do have the space, the best thing
is probably to setup a *very* slow server, that always gives a 4xx at the end of
the conversation and preferably is doing greylisting too (look at the program
from OpenBSD or NetBSD unfortunately also called spamd - part of pf).

Paul Shupak
[EMAIL PROTECTED]