Re: Perfect bayes filter ratio spam/ham

2005-08-27 Thread Matt Kettler

At 07:15 PM 8/27/2005, Torsten Bronger wrote:

Which bayes filter ratio is better: 1:1 or the natural incoming
ratio?


1:1 actualy. I was a strong proponent of natural, but Dan Q corrected me. 
After a lot of thinking about the statistics, it made sense.




Problems with SpamAssassin 3.1 RC1and MIMEDefang

2005-08-27 Thread John Rudd


This is a problem that some mimedefang people are experiencing with SA 
3.1 rc1.


(mimedefang slave processes are becoming un-killable due to a 
mis-feature in SA 3.1 which messes with SIGCHLD)



Begin forwarded message:


From: "David F. Skoll" <[EMAIL PROTECTED]>
Date: August 27, 2005 6:01:28 PM PDT
To: mimedefang@lists.roaringpenguin.com
Subject: Re: [Mimedefang] Problems with SpamAssassin 3.1 RC1and 
MIMEDefang


Martin Blapp wrote:



Please download SA3.1 Pre 1 and try yourself.


I downloaded it, and didn't have to try anything; the problem was 
obvious

after I read the SA 3.1 code.

It's a bug in SA 3.1.

Look at the file "Dns.pm", in the routine enter_helper_run_mode.
We see this code:

  # enforce SIGCHLD as DEFAULT; IGNORE causes spurious kernel warnings
  # on Red Hat NPTL kernels (bug 1536), and some users of the
  # Mail::SpamAssassin modules set SIGCHLD to be a fatal signal
  # for some reason! (bug 3507)
  $self->{old_sigchld_handler} = $SIG{CHLD};
  $SIG{CHLD} = 'DEFAULT';

There's a leave_helper_run_mode that resets the SIGCHLD handler to its
old value.

HOWEVER: If the slave dies sometime between enter_helper_run_mode and
leave_helper_run_mode, the multiplexor never gets a SIGCHLD signal.

I don't know why the SA developers are even monkeying with the SIGCHLD 
handler

in the Perl module; you'd have to ask them.  It seems like a bad idea
to me.

I think I have a workaround; I'll release a beta soon.
In the meantime, I believe that turning off the embedded interpreter 
will

make it work properly.

Regards,

David.




Re: HELO_DYNAMIC_IPADDR - score too high?

2005-08-27 Thread wolfgang
In an older episode (Saturday, 27. August 2005 19:24), Robert Menschel wrote:

> If you can send me the full email, with headers, so I can compose a
> whitelist_from_rcvd rule for it, and if you are personally certain
> they do not send spam from that From address, I'll add an entry for
> them into 70_sare_whitelist.cf

For the records: mail sent to Robert.





Perfect bayes filter ratio spam/ham

2005-08-27 Thread Torsten Bronger
Hallöchen!

Which bayes filter ratio is better: 1:1 or the natural incoming
ratio?

Tschö,
Torsten.

-- 
Torsten Bronger, aquisgrana, europa vetusICQ 264-296-646



Re: sa-learn over already-scanned spam

2005-08-27 Thread Loren Wilton
> If I pass this directory to sa-learn, will sa-learn detect the SA
> part and skip over it, or do I have to fear that the SA messages
> skew my bayes data?

Normally SA will do the right thing and recognize the markup.  If this is
from a really old version of SA and you are learning on a recent version, I
suppose there could potentialy be problems.  However, I would tend to not
learn spam more than about 6 months old anyway.

Loren



sa-learn over already-scanned spam

2005-08-27 Thread Torsten Bronger
Hallöchen!

I have a directory with my old spam.  Most of it has been recognised
by SA as such, so its body got replaced by an SA message with the
original body moved to an attachment.

If I pass this directory to sa-learn, will sa-learn detect the SA
part and skip over it, or do I have to fear that the SA messages
skew my bayes data?

Tschö,
Torsten.

-- 
Torsten Bronger, aquisgrana, europa vetusICQ 264-296-646



Re: SURBL Redirection Problem

2005-08-27 Thread Daryl C. W. O'Shea

Craig McLean wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

3.1.0-rc1 nailed it to the wall.

Craig.


<...>


domain
|  4.5 URIBL_SC_SURBL Contains an URL listed in the SC SURBL
blocklist
| [URIs: moonboard.info]



Did you detect that with a redirector_pattern?  I don't see that 
detected with a stock 3.1.0-rc1 here (no hint of it when SA is run with 
-Duri).


Daryl



DNS cache size for moderatly busy sites?

2005-08-27 Thread email builder
Hello,

  We just migrated to Tinydns from BIND and are looking at our cache size
(OK, so I am really talking about dnscache, not tinydns itself).  Looking at
our cache logs from the last 12 hours (2am Friday night to 2pm Saturday
afternoon), I see our "cache motion" is already 75MB of data.  Wow.  That's
in a relatively low activity time for us.  We get an average of somewhere
under 100,000 mails a day.

  I am curious what other people's cache sizes are set to.  If the numbers we
are seeing hold up (especially during peak), and if we wanted to cache 3 days
worth of DNS queries, it seems like we'd need something like a 500MB+ cache
size.  Is it me, or does that seem rather large?  I wonder how efficient
dnscache would be at that size anyway...

Thanks for any tips!


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Re: SURBL Redirection Problem

2005-08-27 Thread Craig McLean

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

3.1.0-rc1 nailed it to the wall.

Craig.

Ilan Aisic wrote:
|
|  pts rule name  description
|  --
- --
|  0.9 RCVD_BY_IP Received by mail server with no name
| -6.0 USER_IN_WHITELIST_TO   User is listed in 'whitelist_to'
| -0.0 DK_VERIFIEDDomain Keys: signature passes verification
|  0.0 DK_SIGNED  Domain Keys: message has an unverified
signature
|  3.2 FUZZY_PHARMACY BODY: Attempt to obfuscate words in spam
|  1.3 INFO_TLD   URI: Contains an URL in the INFO top-level
domain
|  1.0 LOCAL_INFO_TLD URI: Contains an URL in the INFO top-level
domain
|  4.5 URIBL_SC_SURBL Contains an URL listed in the SC SURBL
blocklist
| [URIs: moonboard.info]
|  2.1 URIBL_WS_SURBL Contains an URL listed in the WS SURBL
blocklist
| [URIs: moonboard.info]
|  3.0 URIBL_OB_SURBL Contains an URL listed in the OB SURBL
blocklist
| [URIs: moonboard.info]
|  3.8 URIBL_AB_SURBL Contains an URL listed in the AB SURBL
blocklist
| [URIs: moonboard.info]
|  2.0 URIBL_XS_SURBL Has URI in XS - Testing
| [URIs: moonboard.info]
|  4.1 URIBL_JP_SURBL Contains an URL listed in the JP SURBL
blocklist
| [URIs: moonboard.info]
|  3.0 URIBL_SC2_SURBLHas URI in SC2 at
http://www.surbl.org/lists.html
| [URIs: moonboard.info]
|  1.7 SARE_OBFU_VISIT2   found apparent obfuscation of word used in
spam
|
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDEOEtMDDagS2VwJ4RAvTNAJ4j7+6v+Dj/j+JrmE7iwVC5dTLHWwCgtikJ
6x0dpPWA8KhAvFRbH/5yE3k=
=hs1n
-END PGP SIGNATURE-


Re: SURBL Redirection Problem

2005-08-27 Thread Loren Wilton
Perhaps changing the uri check would be a short-term fix.  There is a
redirector pattern detector in SA which would be the right thing to fix.

Loren



Re: HELO_DYNAMIC_IPADDR - score too high?

2005-08-27 Thread Robert Menschel
Hello wolfgang,

Saturday, August 27, 2005, 3:50:20 AM, you wrote:

w> we received a Duden newsletter (duden is *the* spelling
w> rules/grammar/dictionary publisher in germany) with the header:

Wolfgang,

If you can send me the full email, with headers, so I can compose a
whitelist_from_rcvd rule for it, and if you are personally certain
they do not send spam from that From address, I'll add an entry for
them into 70_sare_whitelist.cf

Bob Menschel





Re: How does SA detect non-english language?

2005-08-27 Thread Robert Menschel
Hello John,

Friday, August 26, 2005, 6:25:14 AM, you wrote:

JH> Hello,

JH> We have had a complaint from a user that some of his Japanese mail
JH> (being received by us) is always marked by SA as spam. As a University
JH> it is natural for us to receive foreign mail messages.

Understood.

JH>   X-Spam-Status: Yes, score=13.7 required=8.0 tests=BAYES_99,HTML_20_30,
JH> HTML_MESSAGE,MANGLED_LOOK,SARE_HTML_P_MANY3,SARE_RAND_2,
JH> SARE_RECV_IP_218216,SARE_SUB_ENC_ISO2022JP,SARE_SUB_PCT_LETTER,
JH> SUBJ_ALL_CAPS autolearn=unavailable version=3.0.4

JH> Unfortunately at the time I had left included in our site-wide
JH> configuration some of the specific 'ENG' SARE rules, so that explains
JH> the SARE_SUB_ENC_ISO2022JP matching and bumping the score up a bit. The
JH> SARE_RECV_IP_218216 is also a bit worrying (the message may have passed
JH> through a known spam relay).

If you're using the latest SARE version, SARE_RECV_IP_218216 should be
scoring only 0.964, because we have detected ham coming through that
range of servers (though spam:ham > 100:1). If you can send me some
confirmed ham (full emails, headers and all), I can add those to my
corpus and that will help drive the score down.

MANGLED_LOOK is the larger concern, with a score of 2.3. Like the ENG
rules, the MANGLED rules file should not be used if you expect any
significant non-English ham.  I would remove that file from your
collection.

The 70_sare_obfu*.cf file set is slowly replacing MANGLED, and seems
to be successful in avoiding most language problems.

SARE_RAND_2 also scores 2.5 -- That tests for a specific string
suggesting that a broken ratware configuration inserted something like
%RND into the email. I suppose it's possible, but it seems unlikely
that the Japanese email would match that pattern.  If you can send me
the exact email which does so, maybe I can track that down.

SARE_HTML_P_MANY3 scores only 0.217, so that's not much of a concern.

SARE_SUB_PCT_LETTER with a score of 1.152 is also a significant
contributor, matching a percent sign, followed by a single letter,
then word break. There is no percent sign in the raw subject you
posted, so I assume it's in the code after translation. Seems strange.
Again, a copy of that exact email would help me analyze this.

The biggest concern, as Matt pointed out, is your BAYES_99. If this is
indeed ham, then you need to train these ham, because your Bayes
system believes firmly that these are spam.

Bob Menschel





Re: HELO_DYNAMIC_IPADDR - score too high?

2005-08-27 Thread List Mail User
>Hi,
>
>we received a Duden newsletter (duden is *the* spelling 
>rules/grammar/dictionary publisher in germany) with the header:
>
>Received: from ds80-237-180-34.dedicated.hosteurope.de 
>(ds80-237-180-34.dedicated.hosteurope.de [80.237.180.34])
>by netra27.desy.de (DesyMail_In_27) with ESMTP id 3B5D6FB90A
>for ; Fri, 26 Aug 2005 17:00:38 +0200 (MEST)
>
>It got, among others, the scores
>4.4 HELO_DYNAMIC_IPADDR
>2.2 DCC_CHECK
>2.4 MIME_HTML_ONLY_MULTI
>
>This makes me wonder if HELO_DYNAMIC_IPADDR should get a lower score in SA in 
>general - I have now lowered it's score in our setup to reduce the FP risk.
>
>Cheers,
>
>wolfgang
>
Wolfgang,

Assuming you really do want the newletter, you should also
be adding it to the DCC whitelist.  That way it won't trigger the
DCC_CHECK *and* you won't be reporting it to the DCC servers (a
separate choice, but one I use for any "signed-up-for" bulk mail);
See the DCC man pages for examples and syntax.


Paul Shupak
[EMAIL PROTECTED]


SURBL Redirection Problem

2005-08-27 Thread Ilan Aisic
This is a sniplet from spam content I got:

http://chietaphi.com/catalog/redirect.php?action=url&goto=www.vxneev.moonboard.info/?100aa983aGd9080f4c0bfF3c1362f8e1";>Just
VISlT EPharmaccy-By

It did not trigger any of the URI rules even though moonboard.info is
listed in all the places.
They have exploited a redirector script on chietaphi.com which looks legit.

I think it should not be hard to improve the SA plugin for URI
(check_dnsbl) to also check something as obvious as this redirection. 
Perhaps it can be done with a second call after parseing the string
followiong the domain name and realizing it contains a URI.

-- 
Ilan Aisic
Registered Linux User 8124 http://counter.li.org


HELO_DYNAMIC_IPADDR - score too high?

2005-08-27 Thread wolfgang
Hi,

we received a Duden newsletter (duden is *the* spelling 
rules/grammar/dictionary publisher in germany) with the header:

Received: from ds80-237-180-34.dedicated.hosteurope.de 
(ds80-237-180-34.dedicated.hosteurope.de [80.237.180.34])
by netra27.desy.de (DesyMail_In_27) with ESMTP id 3B5D6FB90A
for ; Fri, 26 Aug 2005 17:00:38 +0200 (MEST)

It got, among others, the scores
4.4 HELO_DYNAMIC_IPADDR
2.2 DCC_CHECK
2.4 MIME_HTML_ONLY_MULTI

This makes me wonder if HELO_DYNAMIC_IPADDR should get a lower score in SA in 
general - I have now lowered it's score in our setup to reduce the FP risk.

Cheers,

wolfgang



Re: Feature Request: dynamic trusted_networks

2005-08-27 Thread Thomas Hochstein
"jdow" schrieb:

>> However, 
>> if a message came from a client who gave SMTP-AUTH, it ought to be 
>> "trusted" (and not subjected to the blacklist checks). 
>
> Would you care to expound on your theory here. What makes you think
> a valid SPF is a sign of a good guy? 

SMTP authentification has nothing - really nothing - to do with SPF.

-thh