Re: PDF rule not matching -- split line content type?

2007-08-16 Thread Chris Lear

* Jo Rhett wrote (16/08/07 07:41):

Since nobody is paying attention


Or they're asleep. Your messages were at 23:44 and 07:41 here.

, let me clarify.  The current rule is 
wrong:


mimeheader __TVD_MIME_ATT_APContent-Type =~ /^application\/pdf/i
mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
/^application\/octet-stream.*\.pdf/i


meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && 
__TVD_MIME_ATT && !__TVD_BODY


This evaluates to exactly the same as this:

meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && !__TVD_BODY

I believe that the original rule's intent was this:

meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT && !__TVD_BODY


I don't think you're right.

The rule looks like this to me:

meta TVD_PDF_FINGER01
__TVD_MIME_CT_MM # content-type is multi-part mixed
&& __TVD_MIME_ATT_TP # and has a text-plain part
&& __TVD_MIME_ATT# and has an attachment that is either
__TVD_MIME_ATT_AP# application/pdf
__TVD_MIME_ATT_AOPDF # or application/octet-stream.*.pdf
&& !__TVD_BODY   # and has no non-whitespace text content

Your rule would seem to match anything with no non-whitespace text 
content regardless of whether or not a pdf was attached.


I was looking into this very rule about 3 days ago, because of false 
positives (client mailing out auto-generated pdfs which are being 
rejected by messagelabs), and I found that spamassassin -D told me all I 
needed to know about why some e-mail hit this rule and some didn't.


Chris


Re: PDF rule not matching -- split line content type?

2007-08-16 Thread Chris Lear

Jo Rhett wrote:

Chris Lear wrote:

* Jo Rhett wrote (16/08/07 07:41):

Since nobody is paying attention


Or they're asleep. Your messages were at 23:44 and 07:41 here.


, let me clarify.  The current rule is wrong:

mimeheader __TVD_MIME_ATT_APContent-Type =~ /^application\/pdf/i
mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
/^application\/octet-stream.*\.pdf/i


meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && 
__TVD_MIME_ATT && !__TVD_BODY


This evaluates to exactly the same as this:

meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && 
!__TVD_BODY


I believe that the original rule's intent was this:

meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT && !__TVD_BODY


I don't think you're right.

The rule looks like this to me:

meta TVD_PDF_FINGER01
__TVD_MIME_CT_MM # content-type is multi-part mixed
&& __TVD_MIME_ATT_TP # and has a text-plain part
&& __TVD_MIME_ATT# and has an attachment that is either
__TVD_MIME_ATT_AP# application/pdf
__TVD_MIME_ATT_AOPDF # or application/octet-stream.*.pdf
&& !__TVD_BODY   # and has no non-whitespace text content

Your rule would seem to match anything with no non-whitespace text 
content regardless of whether or not a pdf was attached.


I did a full analysis of why the rule is broken, line by line in the 
message you replied to.  But I'll do it again.


(dropping "__TVT_MIME_" for ease of typing)

ATT is a meta of ATT_AP *or* ATT_AOPDF.

But the PDF_FINGER01 requires ATT_TP as well as ATT.  This means that 
really it will only work if ATT_TP matches.  If ATT_A0PDF matches then 
it won't match.


No go back up and read the text I quoted at the top.  Because if this is 
the authors intent then you can shorten the rule, but I somehow don't 
think so.


I read it. I think you got it wrong. The author's intent seems to accord 
with my analysis.




I was looking into this very rule about 3 days ago, because of false 
positives (client mailing out auto-generated pdfs which are being 
rejected by messagelabs), and I found that spamassassin -D told me all 
I needed to know about why some e-mail hit this rule and some didn't.


Perhaps.  But maybe you have difficulty reading the line by line 
analysis I posted below, hm?  I have ~200 messages here that are 100% 
spam that would match the fixed rule, which seems to be the authors intent.




As I say, I read it. It was clear from the start that you didn't 
understand why the rule wasn't firing (and TVD, the rule author, 
explained that). It also appeared to me that your rewrite of the rule 
was the result of a misreading of the logic (or a misunderstanding of 
multipart mime). I thought I could elucidate. I stand by my comments, 
except that I misread your rewrite and thought it was looking only for 
text/plain, whereas it's looking only for pdf mime parts. Theo has 
explained it all now anyway, so there's no more to add.


But forgive me. I should have known better than to step in to a Jo Rhett 
thread. I'll try not to do it again.


Chris


Spamd not killing children

2006-10-16 Thread Chris Lear
Subject sounds unpleasantly like incitement to filicide, for which I
apologise.

The problem I'm having is that spamd doesn't seem to be able to clean up
unwanted idle child processes.

Here's the logfile evidence:

Oct 16 00:12:59 marvin spamd[6351]: prefork: child states: III
Oct 16 00:13:09 marvin spamd[18043]: spamd: connection from localhost
[127.0.0.1] at port 35720
Oct 16 00:13:09 marvin spamd[18043]: spamd: setuid to spamd succeeded
Oct 16 00:13:09 marvin spamd[18043]: spamd: checking message
<[EMAIL PROTECTED]> for spamd:210
Oct 16 00:13:12 marvin spamd[25627]: spamd: connection from localhost
[127.0.0.1] at port 35722
Oct 16 00:13:12 marvin spamd[25627]: spamd: setuid to spamd succeeded
Oct 16 00:13:12 marvin spamd[25627]: spamd: checking message
<[EMAIL PROTECTED]> for spamd:210
Oct 16 00:13:14 marvin spamd[18043]: spamd: identified spam (29.7/5.0)
for spamd:210 in 5.3 seconds, 1545 bytes.
Oct 16 00:13:14 marvin spamd[18043]: spamd: result: Y 29 -
BAYES_99,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E4_51_100,RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,URIBL_AB_SURBL,URIBL_BLACK,URIBL_JP_SURBL,URIBL_OB_SURBL,URIBL_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL
scantime=5.3,size=1545,user=spamd,uid=210,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=35720,mid=<[EMAIL
 PROTECTED]>,bayes=0.891,autolearn=spam
Oct 16 00:13:15 marvin spamd[6351]: prefork: child states: IBK
-^
[...] Time passes, and spamd continues to work [...]

Oct 16 10:18:00 marvin spamd[6351]: prefork: child states: IIKK
-^^

spamd seems to be trying to kill child processes to get the number of
threads down to 2. But for some (apparently unreported) reason the
threads don't die, and the server is slowly collecting children marked
as "K".

I recently upgraded spamassassin to 3.1.5, and I also installed
FuzzyOcr, which I suspect might be part of the problem.

Can anyone tell me a) what logs to look in to work out why this has
happened? (I've looked in the FuzzyOcr log, which does show some errors
and timeouts, but apparently none at relevant times), b) whether there's
anything I can do about it (I'll start by disabling FuzzyOcr, but I'd
like to use it), or c) whether there's a spamassassin bug?

I looked at the code in SpamdForkScaling.pm, and I see that there are 2
places where child processes are killed. In one place (sub
child_error_kill, line 134), there is a warn line if the kill fails. In
the other (sub need_to_del_server, line 732) there isn't.

Chris


Re: Spamd not killing children

2006-10-17 Thread Chris Lear

* Chris Lear wrote (16/10/06 10:32):
> The problem I'm having is that spamd doesn't seem to be able to clean up
> unwanted idle child processes.
>
[...]
I've had a look in the spamd code, and I'm now wondering whether my 
problem is related to logging bugs (eg 
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4237). I've set 
logrotate to restart spamd after syslog restarts as per the advice in 
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4316. Hopefully 
this will fix it.

I'm still unsure whether this is a spamd bug or not.

Chris


Re: ALL_TRUSTED creating a problem

2006-10-18 Thread Chris Lear
* Jo Rhett wrote (18/10/06 08:57):
> Matt Kettler wrote:
>>  It's *really* common to separate spamd from the MTA for anyone that's
>> got any decent volume of mail. And that's not a few sites.
> 
> And I guess that I'm saying
> 
> 1. People installing from RPMs and/or Ports (or Portage, etc) expect 
> things to work out of the box.  Having it be broken for them creates a 
> problem very visible if you search for all_trusted in the list archives.
> 
> 2. "Any decent volume of mail" with "separate servers" means it's a 
> customized mail environment CONFIGURED BY EXPERTS :-)
> 
> I dunno.  I would aim for the former, and then provide good docs for the 
> latter.  The former generally don't read the docs, and I prefer to avoid 
> the mailing list noise.
> 

I hope you don't mind an observer's view here.

It seems that Jo wants autodetection to:
1) comply with the documentation
2) just work for most people
3) be easily fixable in other cases

This, it seems to me, is exactly what it does. OK, maybe it doesn't work
in Jo Rhett's system. But defining "most people" as "people who do
things like Jo Rhett" is suspect at best.

I can see a case for saying "autodetection can't possibly work in all
cases, so disable it by default". But saying "it doesn't do what I
expect, so it's broken" seems to show a disregard for the documentation
and all the (lots of) historical discussions about the subject. Anyone
who has seen spam hitting ALL_TRUSTED (as I have) can sort the problem
out with reference to good documentation in minutes. That's if they are
like me and installed spamassassin without really knowing anything about
it. If they were more switched on, they would have read the
documentation first.


Re: ALL_TRUSTED creating a problem

2006-10-19 Thread Chris Lear

* Jo Rhett wrote (19/10/06 08:55):

Mark wrote:

We cannot really say SA's autodetection is broken, because SA is designed
to be called post-SMTP. Nor that a milter is broken per se for not adding
a Received: header, as that is the responsibility of the MTA itself. But a
milter using SA *can* be said to be broken if it's not proving SA
with the required post-SMTP view of things. Instead of patching SA, or
trying to "fix" it even, any milter using SA should simply DTRT (Do The
Right Thing): which is: add a pseudo Received: header before handing it
over to SA.


You'all are way behind the boat.  We've already patched it to support 
the undocumented requirement.  That's not an issue.


Perhaps SA being focused on "post-SMTP" is the problem here.  Why is 
this the focus?  In the modern world, you want to reject during SMTP not 
send backscatter to the poor folks whose e-mail got forged.


Frankly, a milter environment is the only possible right way to run SA. 
  So why the constant comments as if this is some one-off weird config?




Frankly, anyone who considers the way they do things to be "the only 
possible right way" is in danger of being Just Plain Wrong.


[further spleen-venting withheld]


Re: SA 3.1.7 children hang but don't die

2006-10-19 Thread Chris Lear

* David B Funk wrote (19/10/06 03:47):

On Wed, 18 Oct 2006, Sandy S wrote:


Daryl -
I switched back to 3.1.5 after my last post, and am sorry to report that I'm
still seeing the same issue under 3.1.5.  After running a while, the
processes in a state of K start building up until I manually kill them.

Regretfully (VERY regretfully) turning off FuzzyOCR.

Sandy


I'll second this, SA 3.1.5 & FuzzyOCR on RHEL-AS4

I've been seeing this off & on ever since I added FuzzyOCR.
Logs seem to correlate to FuzzyOCR processing a gif image during a
peak of messages. Get FuzzyOcr.log message:
 FuzzyOcr received timeout after running "10" seconds.




I'm running SA 3.1.5 with FuzzyOCR. I'm seeing errors in the FuzzOCR 
log, like this:



[2006-10-18 09:34:24] FuzzyOcr received timeout after running "10" seconds.
[2006-10-18 09:49:14] FuzzyOcr received timeout after running "10" seconds.
[2006-10-18 10:09:26] Unexpected error in pipe to external programs. 
   Please check that all helper programs are installed 
and in the correct path.
   (Pipe Command "/usr/bin/gifasm -d 
/tmp/.spamassassin2589Eye8ALtmp/out", Pipe exit code 1 (""), Temporary 
file: "/tmp/.spamassassin25893ZSX3Ltmp")



But I'm no longer getting children in the K state, since I put a spamd 
restart into the logrotate script. I haven't turned off FuzzyOCR which 
is doing an excellent job for me.


This isn't particularly conclusive, I'm afraid, because when I was 
seeing the problem it was sporadic and occasional, so it might just be 
luck, though it's been OK for a few days.


Chris


Re: tmp files being left over from FuzzyOCR?

2006-10-19 Thread Chris Lear

* Bill wrote (19/10/06 14:03):

Since I installed FuzzyOCR I've noticed I'm having a lot of files named
similar to  .spamassassin8932mZBFrtmp  left in my /tmp folder. These are
from FuzzyOCR, correct? The content of these files has lots of spaces,
hyphens, commas with a few readable words and the word "picture" a few
times.

Is there something I need to do to ensure these files are removed? After
I manually remove them I see new tmp files being created and removed but
sometimes a file is NOT removed.


I suspect that if you look in your FuzzyOCR log, you will find errors 
that match the unremoved temp files.


Eg from my FuzzyOCR.log:

[2006-10-18 10:10:47] Unexpected error in pipe to external programs.
  Please check that all helper programs are 
installed and in the correct path.
  (Pipe Command "/usr/bin/gifasm -d 
/tmp/.spamassassin2591CHsvrEtmp/out", Pipe exit code 1 (""), Temporary 
file: "/tmp/.spamassassin2591dNqOn7tmp")


I see that /tmp/.spamassassin2591CHsvrEtmp/ is still there, but 
/tmp/.spamassassin2591dNqOn7tmp isn't.


And another example:

[2006-10-18 09:34:24] FuzzyOcr received timeout after running "10" seconds.

#ls -l /tmp/.spamassassin* | grep 09:34
-rw---  1 spamd users 0 Oct 18 09:34 /tmp/.spamassassin2589Wc3z7Gtmp
-rw---  1 spamd users 23579 Oct 18 09:34 /tmp/.spamassassin2589yvpP1Htmp


Looks like when gifasm fails, you get a dir left over. If there's a 
timeout, you get a file left over.


Chris


Re: tmp files being left over from FuzzyOCR?

2006-10-19 Thread Chris Lear

* Bill wrote (19/10/06 15:29):

I'm using FuzzyOcr-2.3b and I can't find any reference to this option in
any of the FuzzyOCR software I downloaded.

focr_keep_bad_images 0

Here's a sample of the items in my /tmp folder. You said your's were
folders, mine's not. All of these files are left behind as at the time I
made this sample it was 9:25.


Look in your FuzzyOCR log. If it's like mine, you will see timeouts like 
this:


[2006-10-18 09:49:14] FuzzyOcr received timeout after running "10" seconds.

If the times on these timeouts match the times on the temp files, then 
that's what's causing them. That logic works for what I'm seeing.




===
CIRCULAR 230 DISCLOSURE: Pursuant to Regulations Governing Practice Before
the Internal Revenue Service, any tax advice contained herein is not
intended or written to be used and cannot be used by a taxpayer for the
purpose of avoiding tax penalties that may be imposed on the taxpayer.
===


Shame. I was hoping to get out of paying some tax.


CONFIDENTIALITY NOTICE:
This electronic mail message and any attached files contain information
intended for the exclusive use of the individual or entity to whom it is
addressed and may contain information that is proprietary, privileged,
confidential and/or exempt from disclosure under applicable law.  If you are
not the intended recipient, you are hereby notified that any viewing,
copying, disclosure or distribution of this information may be subject to
legal restriction or sanction.  Please notify the sender, by electronic mail
or telephone, of any unintended recipients and delete the original message
without making any copies.


I hope I was the intended recipient, but I'm not sure how I can know.


Re: Psst!

2006-10-20 Thread Chris Lear
* Chris Santerre wrote (20/10/06 15:30):
> 
> 
>> -Original Message-
>> From: David B Funk [mailto:[EMAIL PROTECTED]
>> Sent: Friday, October 20, 2006 1:20 AM
>> To: users@spamassassin.apache.org
>> Subject: Re: Psst!
>>
>>
>> On Thu, 19 Oct 2006, Matt Kettler wrote:
>>
>> > Another thing I've been noticing recently.. some idiot has
>> been culling
>> > the web archives of mailing lists, and is trying to send
>> spam emails to
>> > MESSAGE ID's of posts I've made. Check your mail logs!
>> >
>> > One or more of those would make a great spamtrap.
>>
>> Actually this kind of thing has been going on for some time. I still
>> occasionally see spam sent to a Message-ID address derived from
>> a machine that died years ago. The last owner of it was an active
>> Usenet poster and is probably in all kinds of news archives.
> 
> Just curious, but how many people see spam being sent to usersnames with
> the fisrt letter dropped? I see a ton in my logs. I believe spammers
> figure [EMAIL PROTECTED] will also have a [EMAIL PROTECTED]  Too bad for
> them...they do not. :)

Loads. Also with a variety of other manglings. One local part is
dwoodhouse, and some rejected variations are:
8jwoodhouse
8odhouse
dhouse
oodhouse
woodhousejwoodhouse
ydoodhouse

I can't see why they bother. Or maybe the address harvester is broken.


Re: I'm thinking about suing Microsoft

2006-10-24 Thread Chris Lear

* Marc Perkel wrote (23/10/06 19:34):
I'm considering filing a lawsuit against Microsoft to try to get an 
order to make them make public security updates for Windows to everyone, 
registered or not.


The idea is that their product Windows creates a toxic byproduct 
(spam,ddos zombies) that interfere with everyone else's internet usage 
and that they have a responsibility to clean it up. It would be similar 
to a suit where a business that is otherwise legitimate attracts crime 
in a neighborhood or a manufacturer dumping toxic waste into a stream.


Virus infected spam zombie are a toxic byproduct of their business model 
and it affects all of us and they have a duty to the public to fix it. 
I'm somewhat of a legal expert, not a lawyer though. But just wanted to 
get some feedback on the idea.





Only in America...


Re: I'm thinking about suing Microsoft

2006-10-25 Thread Chris Lear
* Marc Perkel wrote (25/10/06 05:22):
> Europeans have sued Microsoft many times.

For anti-competitive behaviour, maybe. For copyright infringement, perhaps.
But for attracting crime? For discriminating against owners of illegal
software? I hope not.
If you win, of course, you might take on php, perl and other easy-to-use
web scripting languages that allow people to write crime-attracting
sites that are easy targets for IRC bots etc. Plenty of scope for the
Perkel suing machine. Unless your real gripe is simply that Microsoft a)
is successful and b) insists on licensing software. Unfortunately,
neither of these things is illegal in any country as far as I can tell.

> 
> Chris Lear wrote:
>> * Marc Perkel wrote (23/10/06 19:34):
>>> I'm considering filing a lawsuit against Microsoft to try to get an 
>>> order to make them make public security updates for Windows to 
>>> everyone, registered or not.
>>>
>>> The idea is that their product Windows creates a toxic byproduct 
>>> (spam,ddos zombies) that interfere with everyone else's internet 
>>> usage and that they have a responsibility to clean it up. It would be 
>>> similar to a suit where a business that is otherwise legitimate 
>>> attracts crime in a neighborhood or a manufacturer dumping toxic 
>>> waste into a stream.
>>>
>>> Virus infected spam zombie are a toxic byproduct of their business 
>>> model and it affects all of us and they have a duty to the public to 
>>> fix it. I'm somewhat of a legal expert, not a lawyer though. But just 
>>> wanted to get some feedback on the idea.
>>>
>>>
>>
>> Only in America...
>>



Re: score=0.0 tests=none -- how can that be???

2006-10-25 Thread Chris Lear
* Debbie D wrote (25/10/06 04:48):
> "Matt Kettler" <[EMAIL PROTECTED]> wrote in message 
> news:[EMAIL PROTECTED]
>> Debbie D wrote:
>>> I'm just not getting it.. I have a whole list of custom rules, I use
>>> RulesDuJour, I have custom scores to mark stuff higher.. I have 
>>> reasonable
>>> limits set.. the users do not adjust tings here, I do..  I use lint when 
>>> I
>>> add scores and rules..
>>>
>>> So tell me.. how in the past week or so I have 11 mails in *my* box that
>>> show:
>>>
>>> X-Spam-Status: No, score=0.0 required=4.5 tests=none
>>>
>> Usually that means a timeout, or your milter was configured to skip SA
>> for the message.
>>
>> How do you call SA? mimedefang? spamc call in procmail.rc?
>>
> 
> Exim 4.52 with SA and ClamAV I use spamc

In that case, the header is (I'm fairly sure) not added by SA, but by
exim. Try stopping spamd. Does exim still add the headers? If so, then
the occasional occurrence is because spamd is overloaded.
Look in the exim mail log for the mail in question. It might give the
answer.

Chris


Re: Amazon / RFCI false positives

2006-11-06 Thread Chris Lear
* Tony Finch wrote (05/11/06 17:43):
> On Sat, 4 Nov 2006, Michael Scheidell wrote:
> 
>> So? Build something better. Its open source. Don't use the RFCI scores,
>> drop them, stop bithing about somehting YOU can change.
> 
> Well, I've added a -2 for email from Amazon, but I thought other people
> might like a warning.

Thanks. Warning appreciated.

I think that the people who made derogatory claims about "Tony's logic",
or claimed that "you don't understand" had failed to appreciate what
"These messages are wanted by their recipients so should not be
scored as spam by SpamAssassin" means. Anyone who disagrees with that
piece of logic would appear to be using Spamassassin for a purpose that
its designers didn't think of.

Chris


Re: Amazon / RFCI false positives

2006-11-06 Thread Chris Lear

jdow wrote:

From: "Chris Lear" <[EMAIL PROTECTED]>

* Tony Finch wrote (05/11/06 17:43):

On Sat, 4 Nov 2006, Michael Scheidell wrote:


So? Build something better. Its open source. Don't use the RFCI scores,
drop them, stop bithing about somehting YOU can change.


Well, I've added a -2 for email from Amazon, but I thought other people
might like a warning.


Thanks. Warning appreciated.

I think that the people who made derogatory claims about "Tony's logic",
or claimed that "you don't understand" had failed to appreciate what
"These messages are wanted by their recipients so should not be
scored as spam by SpamAssassin" means. Anyone who disagrees with that
piece of logic would appear to be using Spamassassin for a purpose that
its designers didn't think of.


Tony's phrasing implied that he thought the scoring was so wrong
that it should be modified by the people who wrote the rule and ran
it against mass checks. That logic is dead wrong.


That logic, right or wrong, is yours, not Tony's.



The correct phrasing might have indicated there is a problem for some
sites with Amazon failing RFCi requiring a special rule to negate
Amazon.com's negative scores on RFCi.


I think that "the correct phrasing" was exactly what was given, in that 
case. I understood it, anyway.




Demanding that the RFCi rules vanish into the night just is not going
to fly. And it indicates flawed thought processes.


Which, again, may or may not be true, but certainly wasn't even vaguely 
hinted at by Tony. These flawed thought processes appear (to me, but 
maybe I'm unusually pedantic) to be imaginary.


Chris


Re: How do I stop these?

2006-11-21 Thread Chris Lear
* John Rudd wrote (20/11/06 15:46):
> John Tice wrote:
>> 
>> On Nov 20, 2006, at 10:00 AM, Nathan Zabaldo wrote:
>> 
>>> I am getting pounded by these types of emails.  Does anyone else get 
>>> these? What rule can I apply to have them killed.  It's driving me 
>>> nuts.  Please help!!!
>> 
>> These are scoring at about 4X my threshold without the SARE stock 
>> ruleset. You may need to tweak you scoring. I find bayes_99 to be reliable.
>> 
>> FROM_LOCAL_NOVOWEL
>> FORGED_RCVD_HELO
>> BAYES_99
>> RCVD_IN_SORBS_DUL
>> RCVD_IN_NJABL_DUL
>> 
> 
> 
> RelayCatcher is doing a fine job of keeping me from seeing most of the 
> spam that's out there, lately.  See any messages on this list with 
> "RelayCatcher" in the subject.  Particularly "RelayCatcher 0.3" in the 
> subject.

...or RelayChecker 0.3.

Chris


Easyjet e-mail scoring very high

2006-12-01 Thread Chris Lear
I got an EasyJet confirmation E-mail that scored like this:

BAYES_00=-2.599
DNS_FROM_RFC_ABUSE=0.2
FORGED_RCVD_HELO=0.135
HTML_FONT_FACE_BAD=0.156
HTML_MESSAGE=0.001
HTML_TINY_FONT=2.324
MARKETING_PARTNERS=1.765
MIME_HTML_MOSTLY=1.102
SARE_OBFU_AMP2B=2.555
SARE_SPEC_LEO_LINE03a=0.408

Which adds to 6.0, and only the Bayes score stopped it being rejected
(I'm rejecting at 6.5). [SA 3.1.3 with recent sa-update+SARE rules]
What's the recommended practice here? Whitelist? Lower the SARE scores?
Remove some less-safe SARE rules? Lower the HTML_TINY_FONT score [which
looks right, but if it's right for me, why not everyone else]? I'd like
all ham to score under 2, ideally. And almost all of it does. But I'd
prefer not to whitelist if possible. I like to feel I can trust SA
without introducing special cases.

Here are the received headers:

Received: from s217124rg180-p.uklond6.savvis.net ([213.174.202.180]
helo=easyjet.com)
by mail.barcombe.net with esmtp (Exim 4.60)
(envelope-from <[EMAIL PROTECTED]>)
id 1GpoFF-0007fV-Ne
for [EMAIL PROTECTED]; Thu, 30 Nov 2006 15:54:47 +
Received: from mail pickup service by easyjet.com with Microsoft SMTPSVC;
 Thu, 30 Nov 2006 15:54:50 +

I think the "Received: from mail pickup service" line is causing the
SARE_OBFU_AMP2B rule to fire. Am I right? If so, isn't this likely to be
a reasonably common cause of false positives?

Chris


Re: Easyjet e-mail scoring very high

2006-12-01 Thread Chris Lear
* Loren Wilton wrote (01/12/06 13:57):
>> HTML_FONT_FACE_BAD=0.156
>> HTML_MESSAGE=0.001
>> HTML_TINY_FONT=2.324
>> MARKETING_PARTNERS=1.765
>> MIME_HTML_MOSTLY=1.102
>> SARE_OBFU_AMP2B=2.555
>> SARE_SPEC_LEO_LINE03a=0.408
>>
>> I think the "Received: from mail pickup service" line is causing the
>> SARE_OBFU_AMP2B rule to fire. Am I right? If so, isn't this likely to be
> 
> Nope.  All of the rules above are effectively body rules, dealing mostly 
> with various forms of HTML obfuscation.

Thanks for pointing that out. I was being rather dim.

The html contains this sort of thing:
http://www.easyjet.com/EN/Members/

Which looks like the culprit. In fact, every full stop in the html is
represented as . for some reason.

Still wondering though... how do you solve a problem like EasyJet?

Chris


Re: Easyjet e-mail scoring very high

2006-12-01 Thread Chris Lear
* Loren Wilton wrote (01/12/06 14:54):
>> The html contains this sort of thing:
>> http://www.easyjet.com/EN/Members/
>>
>> Which looks like the culprit. In fact, every full stop in the html is
>> represented as . for some reason.
>>
>> Still wondering though... how do you solve a problem like EasyJet?
> 
> 
> Sure looks like spam to me.  ;-)
> 
> Which also looks like just about every airline message I've seen from any 
> airline.  :-(  Apparently they hired spammers to design their marketing 
> campain mail.
> 
> You could try sending to mostmaster or whatever at whichever marketing 
> company is really sending that mail and see if you can get any attention 
> from them.  Probably not, but it might be worth trying.

The trouble is, it's not marketing. It's a confirmation of a flight
booking, which I paid for. The airline doesn't issue tickets. So it's
something I genuinely want in my inbox. It looks like it's generated
directly by the easyjet.com web server.


Re: Easyjet e-mail scoring very high

2006-12-01 Thread Chris Lear
* Adam Stephens wrote (01/12/06 16:10):
> Chris Lear wrote:
>> * Loren Wilton wrote (01/12/06 14:54):
>>   
>>>> The html contains this sort of thing:
>>>> http://www.easyjet.com/EN/Members/
>>>>
>>>> Which looks like the culprit. In fact, every full stop in the html is
>>>> represented as . for some reason.
>>>>
>>>> Still wondering though... how do you solve a problem like EasyJet?
>>>>   
>>> Sure looks like spam to me.  ;-)
>>>
>>> Which also looks like just about every airline message I've seen from any 
>>> airline.  :-(  Apparently they hired spammers to design their marketing 
>>> campain mail.
>>>
>>> You could try sending to mostmaster or whatever at whichever marketing 
>>> company is really sending that mail and see if you can get any attention 
>>> from them.  Probably not, but it might be worth trying.
>>> 
>>
>> The trouble is, it's not marketing. It's a confirmation of a flight
>> booking, which I paid for. The airline doesn't issue tickets. So it's
>> something I genuinely want in my inbox. It looks like it's generated
>> directly by the easyjet.com web server.
>>   
> 
> I had some complaints about that this week; it's obviously a new issue, 
> and it looks like it only applies to the ticket confirmations. Since 
> people really need these booking confirmations I've whitelisted it - 
> using a whitelist_from_rcvd rule seems to catch the booking 
> confirmations only as the marketing material is sent from a different 
> machine.

Thanks for all the advice. I've reluctantly whitelisted them and written
a polite message to [EMAIL PROTECTED] It doesn't seem to have
bounced, so maybe someone will read it. I'll let you know if I get a
response.
Meanwhile, I suppose this is something for others to be aware of if you
run an mta that rejects on high SA scores (and have users that might
want to fly EasyJet).

Chris


Re: SV: Help with understanding a rule

2006-12-07 Thread Chris Lear

* [EMAIL PROTECTED] wrote (07/12/06 12:03):

The list managers are the first ones who have to change.



Yes, you are probably right. But: there must be a reason why the
rule no_real_name exists? And if there is a rule (written or not)
that From: headers should contain a real name, I want to follow it.

And to follow it I need to convince my IT staff somehow...

So, what is the reason behind no_real_name?


Most MUAs, most of the time, put a real name into mail they send. It's 
standard setup. So not having a real name is, perhaps, a spam sign This 
isn't the same as contravening RFCs. Remember that there's a rule called 
HTML_MESSAGE as well, which might be a spam sign. Both of these are 
bound to hit ham a lot of the time, so scoring them high would be, at 
best, an unusual decision. Scoring them high enough to reject would be 
very unusual.


As it happens, on a server I manage NO_REAL_NAME hits 5% of spam, and 
25% of ham (much of which is not MUA-originated). So it's not a rule I'd 
like to reject on.


But if a mailing list or a user has a "you must provide a real name" 
policy, spamassassin's flexible enough to be able to enforce it.


Chris


Re: Botnet 0.6 plugin for Spam Assassin availabile

2006-12-08 Thread Chris Lear
* John Rudd wrote (07/12/06 18:33):
> (I had a bout of insomnia last night, and got more done than I had 
> pre-announced yesterday...)
> 
> 
> The next version of the Botnet plugin for Spam Assassin is ready.  The 
> install instructions are in the Botnet.txt file, and in the INSTALL file.
> 
> For those who don't know what Botnet is, it's a plugin which tries to 
> identify whether or not the message has been submitted by a 
> botnet/spam-zombie type host by looking at its DNS characteristics (no 
> reverse DNS, reverse DNS that doesn't resolve, or doesn't resolve back 
> to the relay's IP, or reverse DNS that contains things that look like an 
> ISP's client address).  The places I've been using it, and the people I 
> hear about who are using it, have seen a high degree of success.
> 
> It can be downloaded from:
> 
>   http://people.ucsc.edu/~jrudd/spamassassin/Botnet.tar
> 
> 
> As usual, feedback, statistics, bug reports, feature suggestions, are 
> all welcome.

I've been running the BOTNET rules for a little while now. It's the
most-hit rule on the machine (above BAYES_99 even). But I get a
significant number of false positives.

Here's some sa-stats output:

TOP SPAM RULES FIRED
--
RANKRULE NAME   COUNT  %OFMAIL %OFSPAM  %OFHAM
--
   1BOTNET   138166.37   90.866.44
   2BAYES_99 127459.50   83.820.00
   3HTML_MESSAGE 118475.06   77.89   68.12
   4BOTNET_CLIENT104850.21   68.954.35
   5BOTNET_IPINHOSTNAME   96245.45   63.291.77
   6URIBL_BLACK   75135.12   49.410.16
   7RCVD_IN_SORBS_DUL 72533.96   47.700.32
   8URIBL_JP_SURBL68832.13   45.260.00
   9BOTNET_CLIENTWORDS60829.61   40.004.19
  10URIBL_SC_SURBL52424.47   34.470.00

I think the default score of 5 is far too high. I'm scoring it at 2 at
the moment, which seems OK.

I'd quite like to be able to give more score to BOTNET_IPINHOSTNAME than
BOTNET_CLIENTWORDS, because it seems to give fewer false positives [I
think this will probably improve in 0.6, though]. But this isn't a very
big deal. So that's a mild vote against the __ prefix.

I added p0f to my arsenal recently, hoping it would work to lower the
false-positive rate of BOTNET by checking for Windows machines, but it
seems that almost all the BOTNET false positives are Exchange servers,
so p0f aggravates rather than mitigates that.

Hope this feedback is useful. Thanks for the plugin. I take the view
that network tests and RBLs (especially URIBLs), rather than body
checks, are the best long-term spam-fighting tools.

Chris


Re: MSRBL

2006-12-15 Thread Chris Lear

Bret Miller wrote:
>> I'm more interested in the Image signatures it has.  If
>> they're really
>> useful and reliable.  I expect that keeping up with image
>> spam wouldn't
>> be very scalable, but it might at least help reduce some load
>> (since we
>> do virus scanning before letting Spam Assassin see a message) for
>> whichever images are known.
>>
>
> I ran about half a day yesterday with both images and spam signatures.
> Images hit a whopping 4 messages and spam hit about 40 with 3 FPs, both
> a very, very low percentage (way under 1%) of spam. ImageInfo does a
> much better job IMO.

I'm using http://www.sanesecurity.com/clamav/ (on my home domain only at 
the moment) which saves sa some work (clamav runs before sa). About a 
third of the spam that was previously caught by sa is now caught by 
clamav instead. I tried MSRBL, but got very few hits. Sorry - no info 
about false positives, because anything that hits is rejected. I haven't 
heard from anyone, though.

I'm surprised by how effective it is.

Chris


Re: Botnet 0.6 plugin for Spam Assassin availabile

2006-12-18 Thread Chris Lear
* Oliver Schulze L. wrote (18/12/06 15:42):
> Nice stats!
> How do you generate them in SA 3.1.7 ?

I use this: http://www.rulesemporium.com/programs/sa-stats-1.0.txt

Chris

> 
> Thanks
> Oliver
> 
> Chris Lear wrote:
>> Here's some sa-stats output:
>>
>> TOP SPAM RULES FIRED
>> --
>> RANKRULE NAME   COUNT  %OFMAIL %OFSPAM  %OFHAM
>> --
>>1BOTNET   138166.37   90.866.44
>>2BAYES_99 127459.50   83.820.00
>>3HTML_MESSAGE 118475.06   77.89   68.12
>>4BOTNET_CLIENT104850.21   68.954.35
>>5BOTNET_IPINHOSTNAME   96245.45   63.291.77
>>6URIBL_BLACK   75135.12   49.410.16
>>7RCVD_IN_SORBS_DUL 72533.96   47.700.32
>>8URIBL_JP_SURBL68832.13   45.260.00
>>9BOTNET_CLIENTWORDS60829.61   40.004.19
>>   10URIBL_SC_SURBL52424.47   34.470.00
>>
>>   
> 



Can spamassassin stop this?

2006-05-12 Thread Chris Lear
I run a fairly uncompromising spamassassin, which rejects mail scoring
5.5 or above (and in my own mailbox, I treat anything scoring over 0 as
suspect). I find that almost all false negatives that slip through are
the result of a not-perfectly-trained site-wide bayes database
[Basically, I train it, so it works well for me. Hardly anyone else
bothers]. I run lots of network tests, which work really well.
But this e-mail looks like it would never get blocked. Does sa have a
hope against this, or have the spammers finally come up with something
that can't be filtered? Even with BAYES_99 (default score 3.5) it would
score just under 5.5.

This is the first time I've noticed a spam e-mail that I can't see how
spamassassin could kill.

Chris

=


Return-path: <[EMAIL PROTECTED]>
Envelope-to: [EMAIL PROTECTED]
Delivery-date: Fri, 12 May 2006 04:52:03 +0100
Received: from bzq-88-155-227-248.red.bezeqint.net ([88.155.227.248])
by marvin.thomasmurray.com with smtp (Exim 4.54)
id 1FeOh7-0001os-6a
for [EMAIL PROTECTED]; Fri, 12 May 2006 04:52:03 +0100
From: "kalyn kari" <[EMAIL PROTECTED]>
To: "dacia katelin" <[EMAIL PROTECTED]>
Subject: Was it love, or was it the thought of being in love?
Date: Fri, 12 May 2006 03:52:03 +
Message-ID: <[EMAIL PROTECTED]>
MIME-Version: 1.0
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Mailer: PHP/4.4.0
X-Marvin-Spam-Score: 1.9
X-Marvin-Spam-Level: +
X-Marvin-Spam-Report: Marvin spam report: Score = 1.9.
Tests=BAYES_50=0.001,HTML_MESSAGE=0.001,MIME_HTML_ONLY=0.001,RCVD_IN_NJABL_DUL=1.946
X-Marvin-AntiVirus: Clean








Hullo!
[E]rectile
[D]ysfunction?

We can help! Our site: ochhorfando[dot]com ;) Don't forget
to replace "[dot]" to "."
---
cigarette after another and extinguishing them on the edge of a
full ash tray, with Dolly, and with the old prince, where there
was talk about dinner, about politics, about Marya Petrovna's
illness, and where Levin suddenly forgot for a minute what was
happening, and felt as though he had waked up from sleep; the
other was in her presence, at her pillow, where his heart seemed
breaking and still did not break from sympathetic suffering, and
he prayed to God without ceasing.  And every time he was brought
back from a moment of oblivion by a scream reaching him from the




Re: Suing Spammers

2006-05-15 Thread Chris Lear
* jdow wrote (14/05/06 02:09):
> From: "Gary W. Smith" <[EMAIL PROTECTED]>
>> 
>> On another paw, Craig, do consider who is the injured party. Marc is
>> not. The final recipient, the addressee, is an injured party for the
>> spam in her mailbox. The addressee's ISP is also an injured party due
>> to the (vastly) increased mail volume her servers must handle. They
>> have a tort for filing suit. The person who filters the spam is, one
>> can argue, benefiting from the spam. So it is hard for him to sue
>> and win anything.
>> 
> 
> 
> I disagree.  As a provider you are paying for the acceptance,
> processing, storage and re-transmission of that spam.  It is costing you
> resources which can be quantified.  My boxes have been running at about
> 15% on average, 24x7.  Knowing that spam is 80% of that then you might
> be able to prove in a court of law that it is indeed damaging you
> financially to process this.
> 
> But the burden gets turned back to you to prove this damage.  So the
> question is what the return will be versus the cost of proving it.
> Unless you are processing millions of spams per day from a single
> spammer then more than likely you will be hard pressed to see any type
> of return.
> 
> << jdow >> Waitaminit - Marc heavily implied that he was offering a
> spam filtering service. If that is true then Marc is not being injured.
> The spam is his bread and butter, regardless of how much he wishes to
> be put out of that business.

What if he's not providing a "spam filtering service", but a "clean
e-mail service"? Then the spammer is the enemy, not the bread and
butter. And it's the same service even if all spammers boycott his
servers. Indeed, I imagine he would get more customers if all spammers
boycotted his servers.

> 
> << jdow >> That is why I made comment of three cases, the actual end
> recipient, the actual end recipient's ISP, and the spam filtering
> service provider. Of the three the first can sue and win something
> nominal. In the second case the ISP has so much bulk that the costs
> of the filtering and extra machinery are demonstrable injuries that
> amount to big money. The third case is a person actually making the
> spam filtering his business. In what way is that third person being
> injured?

In just the same way as the ISP, it seems to me. He's trying to provide
a service (delivering legit E-mail), and incurs demonstrable costs.

Chris


Re: Lots of missed spam

2006-06-29 Thread Chris Lear

* Leigh Sharpe wrote (29/06/06 03:03):


This was my first suspicion. I turned off Bayes tests temporarily and
it had little effect. I'm seriously considering resetting the bayes
and starting again


I can recommend that. I had a situation a while ago where the bayes 
database got mysteriously corrupted (sa-learn dump magic suddenly showed 
nspam way way less than nham). I deleted the whole bayes database, did a 
bit of manual training, let it carry on with the automatic training, and 
it was all fine again in a day or so.


If spam hits BAYES_00 (which carries a negative score), you're better 
off without bayes at all.


But with good bayes, most of the spam you've posted will be blocked. The 
difference between BAYES_00 and BAYES_99 is +6.099. So a small negative 
score with BAYES_00 will be sent over 5 by BAYES_99.


Chris


Re: sa-learn script

2006-07-11 Thread Chris Lear

* Nicholas Payne-Roberts wrote (11/07/06 11:58):
Does anybody know a good way to script sa-learn to daily check on junk 
e-mail folders? i'm currently trying the following line in a cron.daily 
script, but its throwing up an error:


find /home/vpopmail/domains -name ".Junk E-mail" -exec  sa-learn 
--showdots --spam cur {} \;


Your --exec subcommand is the problem. The {} expands to the full path 
of the found file. It doesn't change directory. A version that might work is


find /home/vpopmail/domains -name ".Junk E-mail" -exec  sa-learn 
--showdots --spam {}/cur \;


There's not much point using --showdots in cron, I would have thought, 
but it's probably useful for testing.


To make sure your find command is right, you can do something like this:

find /home/vpopmail/domains -name ".Junk E-mail" -exec echo "sa-learn 
--showdots --spam {}/cur" \;


which will simply echo a list of commands that would get executed.

Chris


Yahoo! SpamGuard spam

2006-07-11 Thread Chris Lear
I was entertained by this. A score of 5.491 added to an e-mail because 
of a Yahoo! advert stuck on the bottom by the Yahoo! MTA.

And the advert is for SpamGuard.


[... headers chopped... ]
X-Spam-Score: 2.9
X-Spam-Level: ++
X-Spam-Report: Spam report: Score = 2.9. 
Tests=BAYES_00=-2.599,DRUGS_ERECTILE=0.493,DRUGS_ERECTILE_OBFU=2.408,FUZZY_VPILL=0.924,SARE_OBFU_VIAGRA=1.666


[... email body chopped ...]
___
All New Yahoo! Mail � Tired of [EMAIL PROTECTED]@! come-ons? Let our SpamGuard 
protect you. http://uk.docs.yahoo.com/nowyoucan.html



Chris


Re: The best way to use Spamassassin is to not use Spamassassin

2006-07-13 Thread Chris Lear

* Marc Perkel wrote (12/07/06 18:30):

Catchy subject line eh?

OK - so what I mean by this is that I now use SA for about 5% of all 
incoming email. The reaso of spam is rejected before I get to SA through 
a fairly large number of tricks that allow me to determine with near 
100% accuracy things that are spam. It is none mostly through behavior 
and karma related lists. Being host blacklisted or URI blacklisted.


I don't know if it's relevant to Marc's point, but it seems to me that 
if SA was reduced to network checks only it would still be a very good 
blocker of spam. And perhaps what Marc is doing is, more or less, moving 
SA's network checks into the MTA and using them to reject rather than 
just score.


I suppose something similar would be to score all the URIBL rules and 
RCVD_IN rules high, and abandon the traditional regex rules.


Network checks are easily the most hit spam rules in SA anyway. Here's a 
bit of sa-stats for spam on a machine I look after (the MTA blocks based 
on sbl-xbl.spamhaus.org before anything gets to SA, so that's not 
represented here):


   1BAYES_99
   2URIBL_BLACK
   3URIBL_SBL
   4URIBL_JP_SURBL
   5URIBL_OB_SURBL
   6RCVD_IN_SORBS_DUL
   7RCVD_IN_NJABL_DUL
   8HTML_MESSAGE
   9FORGED_RCVD_HELO
  10URIBL_SC_SURBL
  11URIBL_WS_SURBL
  12SARE_MLB_Stock6
  13URIBL_AB_SURBL
  14SARE_MLB_Stock1
  15STOCK_NAME_FVGT1



Of course that 5% is very important because that is where I get the
data for the other tests that allow me to bypass filtering.


Even this isn't necessarily so. Data for network tests can be collected 
automatically, by trapping spammers who trawl the web/usenet for 
addresses, those who scan for open port 25s, or those who try high MX's. 
So at least some useful data can be collected without SA, or even human 
intervention.



But - I
want you all to start thinking of a new way to look at spam
filtering.


I'm not sure this is a "new way to look at spam filtering", but I agree 
that content testing against regular expressions is increasingly looking 
like a crude and easily-outwitted technique compared to dns tests. Bayes 
is still good, though.


Re: exim4 + forwarding + spamassassin

2006-07-27 Thread Chris Lear

* Zinski, Steve wrote (27/07/06 02:50):

Not sure how to get exim to pass the initial scan to spamd using a
different user. I've gone through my exim.conf file and changed every
single "user = " entry to a known user and it still insists on using
"nobody" for the first pass.

Another thing that intrigues me is the wording of the log entries.

In the first pass, spamd says that it's "checking" the message. In the
second pass it says "processing" the message.


I think exim only puts the message through spamassassin once (then 
subsequently caches the result, if required), and uses the username set 
up in the acl:


# Reject messages with a SpamAssassin score >7
deny message   = Rejected: Flagged as spam ($spam_score).
 spam  = nobody:true
 ^^ <- **here**
 condition = ${if >{$spam_score_int}{70}{1}{0}}

I have a similar setup, except that I run spamc as a user called spamd. 
This gives site-side bayes, and works fine.


Is it possible that the second run through spamd is from you running 
spamc after the message is delivered? Ie, not from exim?


There's an exim-users mailing list that's probably a better place for 
these questions.


Chris




-Original Message-
From: Stuart Johnston [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 26, 2006 3:05 PM

To: users@spamassassin.apache.org
Subject: Re: exim4 + forwarding + spamassassin

Your first scan is running as nobody (that's bad) but the second is
running as szinski.  That would explain the BAYES_99.  I'm not sure
about the FORGED_RCVD_HELO and HTML_50_60 though.


Zinski, Steve wrote:

I need some help trying to figure out why spamassassin scores the same
message differently.

I am using an ACL with exim4 to scan email during the actual smtp
connection (so I can reject spam before my server accepts it). It's
pretty straightforward. My ACL looks like this:
 
# Reject messages with a SpamAssassin score >7

deny message   = Rejected: Flagged as spam ($spam_score).
 spam  = nobody:true
 condition = ${if >{$spam_score_int}{70}{1}{0}}

Everything works just fine for mail destined to local accounts, but
there seems to be a discrepancy in spamassassin when mail is delivered
to a forwarded account (the forwarder directs mail to another local
account; i.e., [EMAIL PROTECTED] --> [EMAIL PROTECTED]). What
happens is that spamassassin scores the message low (non-spam) when it
accepts it from the Internet, but then scores it higher (as spam) when
the message is rerouted to the local mailbox. Here is a snippet from
maillog that illustrates this:

Jul 26 07:58:20 vps spamd[7361]: spamd: connection from localhost
[127.0.0.1] at port 56458 
Jul 26 07:58:20 vps spamd[7361]: spamd: setuid to nobody succeeded 
Jul 26 07:58:20 vps spamd[7361]: spamd: checking message
<[EMAIL PROTECTED]> for nobody:99 
Jul 26 07:58:20 vps spamd[7361]: spamd: clean message (2.6/5.0) for
nobody:99 in 0.1 seconds, 2230 bytes. 
Jul 26 07:58:20 vps spamd[7361]: spamd: result: . 2 -

HTML_MESSAGE,URIBL_SBL,URIBL_WS_SURBL


scantime=0.1,size=2230,user=nobody,uid=99,required_score=5.0,rhost=local
host,raddr=127.0.0.1,rport=56458,mid=<[EMAIL PROTECTED]
8>,autolearn=no 
Jul 26 07:58:20 vps spamd[26587]: prefork: child states: II 
Jul 26 07:58:21 vps spamd[7361]: spamd: connection from localhost
[127.0.0.1] at port 56459 
Jul 26 07:58:21 vps spamd[7361]: spamd: setuid to szinski succeeded 
Jul 26 07:58:21 vps spamd[7361]: spamd: processing message
<[EMAIL PROTECTED]> for szinski:503 
Jul 26 07:58:21 vps spamd[7361]: spamd: identified spam (7.5/5.0) for
szinski:503 in 0.6 seconds, 2183 bytes. 
Jul 26 07:58:21 vps spamd[7361]: spamd: result: Y 7 -



BAYES_99,FORGED_RCVD_HELO,HTML_50_60,HTML_MESSAGE,URIBL_SBL,URIBL_WS_SUR

BL


scantime=0.6,size=2183,user=szinski,uid=503,required_score=5.0,rhost=loc
alhost,raddr=127.0.0.1,rport=56459,mid=<[EMAIL PROTECTED]

hn8>,bayes=0.97051713734,autolearn=no

As you can see, during the initial smtp pass (accepting from remote
host) the message is deemed "clean" with a score of 2.6. Then, when

the

same message is delivered to the local account, it's identified as

spam

with a score of 7.5. Unfortunately, my ACL only kicks in during the
first pass so the message gets accepted and delivered instead of
rejected. Anyone know what I might be doing wrong here?

Any help would be greatly appreciated.

Steve Zinski
University of Richmond






Re: Allowing IMAP/POP to Send Email

2006-08-03 Thread Chris Lear

* Marc Perkel wrote (03/08/06 14:39):


Tony Finch wrote:

The reason that message submission is done with SMTP is because of the
number of SMTP extensions that the MUA will want to use, in particular
DSNs, deliver-by, deliver-after, message tracking, and whatever else may
be invented in the future. If you want to make message submission a part
of IMAP and POP then you'll have to re-do all these SMTP extensions twice,
which is a colossal waste of time.


  


Not really - what I'm proposing is that the IMAP connection just pipe 
the message into an SMTP server. The IMAP is acting only and an 
authenticated connection back to SMTP. I'm not suggesting replacing 
SMTP. What I'm suggesting is that POP/IMAP can be used as a transport to 
get the mail there because it's an existing connection, is already 
established, is already authenticated with the credentials of the email 
account, and it isn't a port that people would block like port 25 is.


I'm not trying to replace SMTP. I'm just trying to suggest a better way 
for end users to get outgoing email to the SMTP server.




What if I set up an SMTP server at home behind my ADSL router, collect 
my vanity-domain mail there, and access it via IMAP or POP3? It seems I 
only have one option, which is to send my mail via IMAP to my home 
server. Which then sends via SMTP to... the Internet (or via a 
smarthost). And the home server sending via SMTP is going to look a bit 
like a MUA sending via SMTP. How would you tell the difference? Is a 
home mail server outlawed in the brave new world? Or does my SMTP server 
have to learn to talk IMAP to make message submissions to the ISP's server?


Chris


Re: DEAR_SOMETHING rule scoring issue

2006-08-09 Thread Chris Lear
* Gregory T Pelle wrote (09/08/06 15:14):
> What is the procedure to have a rule score reviewed?
> 
> I have been looking over the scoring for version 3.1.x at
> 
>   http://spamassassin.apache.org/tests_3_1_x.html
> 
> and think that a score of 1.6 is high for the DEAR_SOMETHING rule.  I
> know that our customer support emails have the first line as "Dear
> ...".  It would seem to me that any business that is
> trying to sound professional would have emails that hit this rule.

Where I work I'm always trying to persuade the people who write bulk
e-mail to customers *not* to start it with "Dear ",
because I think it does the opposite of sounding professional. But maybe
it's just me. They are indeed trying to sound professional, and think
that personalising the e-mail with "Dear" will do that, and I don't seem
to win the argument. It hasn't made me lower the DEAR_SOMETHING score,
though.

Chris


Forwarded spam

2008-07-31 Thread Chris Lear
I'm trying to improve the effectiveness of a spamassassin installation, 
and there's one user who gets a lot of spam that is forwarded from 
another address, which effectively kills the network tests and in some 
cases messes with the BAYES score as well. I want to get rid of it.


My solution to the problem was originally to add the forwarding mtas to 
trusted_networks (seems ironic, but I think this is appropriate).


Unfortunately, this doesn't work, because the headers look like this 
(with apologies for the munging, but it's not my e-mail):


Received: from mta3.iomartmail.com ([62.128.193.153])
by smtp.DOMAIN.com with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
(Exim 4.69)
(envelope-from <[EMAIL PROTECTED]>)
id 1KOUZB-0001Xq-Eb
for [EMAIL PROTECTED]; Thu, 31 Jul 2008 10:35:29 +0100
Received: from mta3.iomartmail.com (localhost.localdomain [127.0.0.1])
	by mta3.iomartmail.com (8.12.11.20060308/8.12.11) with ESMTP id 
m6V9ZOVc018574

for <[EMAIL PROTECTED]>; Thu, 31 Jul 2008 10:35:24 +0100
Received: from p548AAE80.dip0.t-ipconnect.de 
(p548AB09B.dip0.t-ipconnect.de [84.138.176.155])
	by mta3.iomartmail.com (8.12.11.20060308/8.12.11) with SMTP id 
m6V9ZNUK018506

for <[EMAIL PROTECTED]>; Thu, 31 Jul 2008 10:35:24 +0100

[EMAIL PROTECTED] is the original address, which is handled by 
mta[X].iomartmail.com, and it's forwarded to [EMAIL PROTECTED], which is 
handled by smtp.DOMAIN.com.


I can put 62.128.193.153 into trusted_networks, which should make 
spamassassin look at the next header back, but that's another 
iomartmail.com machine (presumably a virus/spam checker), and I'm fairly 
sure adding 127.0.0.1 to trusted_networks would be a mistake.


Question one: Is there a way of getting the network tests working on 
these forwarded e-mails?



My next idea is just to add a load of score to messages to 
ORIGINALDOMAIN.com. Looking in the wiki at 
http://wiki.apache.org/spamassassin/WritingRules#head-36104467608e64f77e1878ec3201073b8180c728 
I see this:


===
Checking the From: line, or any other header, works much the same:

header LOCAL_DEMONSTRATION_FROM From =~ /test\.com/i
score LOCAL_DEMONSTRATION_FROM  0.1

Now, that rule is pretty silly, as it doesn't do much that a 
blacklist_from can't.

===

What I want to do is blacklist_to [EMAIL PROTECTED], but with a 
score of 3 (ie, it's not really a blacklisting). The quote above seems 
to suggest I can do that, but I can't see it in the docs. Question two: 
is it possible to set a score on a blacklisted address?


Finally, I can use header ToCC, and that'll probably do, but I wanted to 
know if there's a better way.


Thanks,
Chris


Re: Forwarded spam

2008-07-31 Thread Chris Lear

* Matt Kettler wrote (31/07/08 11:25):

Chris Lear wrote:
I'm trying to improve the effectiveness of a spamassassin 
installation, and there's one user who gets a lot of spam that is 
forwarded from another address, which effectively kills the network 
tests and in some cases messes with the BAYES score as well. I want to 
get rid of it.


My solution to the problem was originally to add the forwarding mtas 
to trusted_networks (seems ironic, but I think this is appropriate).


Unfortunately, this doesn't work, because the headers look like this 
(with apologies for the munging, but it's not my e-mail):


Received: from mta3.iomartmail.com ([62.128.193.153])
by smtp.DOMAIN.com with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
(Exim 4.69)
(envelope-from <[EMAIL PROTECTED]>)
id 1KOUZB-0001Xq-Eb
for [EMAIL PROTECTED]; Thu, 31 Jul 2008 10:35:29 +0100
Received: from mta3.iomartmail.com (localhost.localdomain [127.0.0.1])
by mta3.iomartmail.com (8.12.11.20060308/8.12.11) with ESMTP id 
m6V9ZOVc018574

for <[EMAIL PROTECTED]>; Thu, 31 Jul 2008 10:35:24 +0100
Received: from p548AAE80.dip0.t-ipconnect.de 
(p548AB09B.dip0.t-ipconnect.de [84.138.176.155])
by mta3.iomartmail.com (8.12.11.20060308/8.12.11) with SMTP id 
m6V9ZNUK018506

for <[EMAIL PROTECTED]>; Thu, 31 Jul 2008 10:35:24 +0100

[EMAIL PROTECTED] is the original address, which is handled by 
mta[X].iomartmail.com, and it's forwarded to [EMAIL PROTECTED], which is 
handled by smtp.DOMAIN.com.


I can put 62.128.193.153 into trusted_networks, which should make 
spamassassin look at the next header back, but that's another 
iomartmail.com machine (presumably a virus/spam checker), and I'm 
fairly sure adding 127.0.0.1 to trusted_networks would be a mistake.
Why would adding 127.0.0.1 to trusted_networks be a mistake? Since trust 
is a path this won't lead to spammers being able to forge trust, as 
they'd have to first get to your system from a trusted IP address. (or 
manage to do a TCP blind-spoofing attack and make it look like it came 
from one)


OK, you've persuaded me. It seemed fishy, but I wasn't being logical. 
I'll do that and keep an eye on it. Don't worry - I'm not going to 
obsess about TCP spoofing.




Question one: Is there a way of getting the network tests working on 
these forwarded e-mails?



My next idea is just to add a load of score to messages to 
ORIGINALDOMAIN.com. Looking in the wiki at 
http://wiki.apache.org/spamassassin/WritingRules#head-36104467608e64f77e1878ec3201073b8180c728 
I see this:


===
Checking the From: line, or any other header, works much the same:

header LOCAL_DEMONSTRATION_FROM From =~ /test\.com/i
score LOCAL_DEMONSTRATION_FROM  0.1

Now, that rule is pretty silly, as it doesn't do much that a 
blacklist_from can't.

===

What I want to do is blacklist_to [EMAIL PROTECTED], but with a 
score of 3 (ie, it's not really a blacklisting). The quote above seems 
to suggest I can do that, but I can't see it in the docs. Question 
two: is it possible to set a score on a blacklisted address?

No, unless you reset the score for all blacklist_to's
 score USER_IN_BLACKLIST_TO 3.0

When I said it "doesn't do much that a blacklist_from can't", I didn't 
mean to say there's nothing it can do that a blacklist_from/to can't.. 
there's just not much. Custom per-address scoring, using a full regex 
instead of a file-glob, and per-address combinations with other rules in 
a meta are things blacklist_from/to can't do that  a rule can.




Thanks. That all makes sense. I was reading too much into the remark. As 
a side note, in my perusal of the documentation, I didn't stumble easily 
on the link between the blacklist_to option and the USER_IN_BLACKLIST_TO 
rule.




Finally, I can use header ToCC, and that'll probably do, but I wanted 
to know if there's a better way.
That's the best way I know of. Also, be aware that unless your MTA drops 
hints about the recipient in the Received: headers with a "for" clause, 
SA won't know who the real recipient is when a message is BCC'ed. This 
is important, as lots of spam is effectively BCC'ed (i.e.: actual 
recipient is in the envelope, but not the To: or Cc:), so your ToCC may 
not match spam.


Understood. That's part of the reason I didn't take to this solution 
originally. I assumed that the blacklist_to option would fetch the real 
recipient out of the received headers (which, as you can see above, do 
contain the "for" clause).


Thanks for the help.

Chris


Re: Forwarded spam

2008-07-31 Thread Chris Lear

* Matus UHLAR - fantomas wrote (31/07/08 14:07):

On 31.07.08 11:05, Chris Lear wrote:
I'm trying to improve the effectiveness of a spamassassin installation, 
and there's one user who gets a lot of spam that is forwarded from 
another address, which effectively kills the network tests and in some 
cases messes with the BAYES score as well. I want to get rid of it.


many tests (e.g. those who chcek for dynamic IP) use last external IP, which
means some network checks will still be killed by such forwarder.


I seem to remember someone saying a while ago that it's not clear to the 
average spamassassin admin (eg me) which rules use trusted and which use 
external. Is there either a place that explains it all - or is there 
some logic that anyone can tell me? Not crucial, but I'm interested.




I think it's the forwarder who has to take care of spam... any further
forwarding blurs the difference between ham and spam...


I agree entirely.

Chris


Re: Removing message/rfc822 attachments to separate files

2005-07-28 Thread Chris Lear
* Herb Martin wrote (28/07/2005 06:21):
[...]
> 
> After writing the following and trying 
> Mail::SpamAssassin::Message (off and on all afternoon)
> I stumbled upon the tool intended for the job:
> 
> MIME::Parser from MIME::Toolkit (which was already on
> my system) -- the pod doc examples had almost exactly
> what I need (added one line to first example):
> 
>  le=MIME%3A%3AParser>
> 
> This does it -- the whole thing -- if I don't mind 
> submitting one file per run (with a command script
> loop for all of them of course):
> 
> #!/usr/bin/perl -w
> 
> use MIME::Parser;
> 
> my $parser = new MIME::Parser;   # Create parser
> $parser->output_dir("./tmp");# Give output dir
> $parser->extract_nested_messages(0); # Extract messages whole?
> $entity = $parser->parse(\*STDIN);   # Parse an input filehandle  
> print "Entity: $entity\n\n" if $entity;
> 
> __END__
> 

I use a similar thing. Sorry for not posting about it earlier - I
thought you had a better solution with Mail::SpamAssassin::Message.
One thing to watch out for if any Thunderbird users want to use it:
Thunderbird's attachments will be extracted with spaces in the filenames
(because the filenames are the message subjects) and sa-learn doesn't
handle them well. I use this to fix it:
http://www.tenacious.us/projects/code/nospace.pl

--
Chris


Bayes expiry/oddity

2005-09-23 Thread Chris Lear
I'm running a reasonably small site-wide spamassassin, and I use a
site-side bayes db. Spamassassin runs as the user spamd.

I noticed that I got spam last night with no BAYES_XX markup. I looked
into it this morning, and discovered that the bayes db only has 47 spam
messages in it (nspam from sa-learn --dump magic). It has about 69000
ham. It must have gone from >200 spams at around 11pm last night to <50
this morning, and the only explanation I can think of is that the spam
has been expired, but on the other hand this seems odd.

Spamassassin learnt 143 messages as spam yesterday (according to my
logs). In the same period it learnt 291 as ham. These figures are
reasonably representative of the traffic (on weekdays, anyway)

Can anyone explain what happened to the bayes db? It's now steadily
auto-learning itself back to normal, but we are going to get many more
false negatives today I think.

Any information/explanation appreciated.

Chris

PS I think it's extremely unlikely that there's been a concerted
attack/mistake by users using sa-learn the wrong way and re-learning the
spam as ham. For one thing, spamassassin is called by exim during the
smtp phase, and if the e-mail is marked as spam it's never delivered to
anyone. For another thing, there's nobody else around that knows what
sa-learn is.


Re: Bayes expiry/oddity

2005-09-23 Thread Chris Lear
* Chris Lear wrote (09/23/05 10:34):
> I'm running a reasonably small site-wide spamassassin, and I use a
> site-side bayes db. Spamassassin runs as the user spamd.
> 
> I noticed that I got spam last night with no BAYES_XX markup. I looked
> into it this morning, and discovered that the bayes db only has 47 spam
> messages in it (nspam from sa-learn --dump magic). It has about 69000
> ham. It must have gone from >200 spams at around 11pm last night to <50
> this morning, and the only explanation I can think of is that the spam
> has been expired, but on the other hand this seems odd.
> 
> Spamassassin learnt 143 messages as spam yesterday (according to my
> logs). In the same period it learnt 291 as ham. These figures are
> reasonably representative of the traffic (on weekdays, anyway)
> 
> Can anyone explain what happened to the bayes db? It's now steadily
> auto-learning itself back to normal, but we are going to get many more
> false negatives today I think.
> 
> Any information/explanation appreciated.

None forthcoming, so I'm putting this down to a freak bayes database
corruption. sa-learn --dump magic now shows 161 spam and 69310 ham
learnt, and I'm letting it sort itself out. In about 3 months I guess it
will be back to normal :-).
Spamassassin works fairly well without bayes, so I don't mind too much,
but I would feel happier if I thought that what happened was understandable.

Chris


Re: How can i block this?

2005-10-12 Thread Chris Lear
* Matt Kettler wrote (10/11/05 19:37):
> Alessio wrote:
>> I have received this mail, the heading "from" is blank! Is possible? 
> 
> Yes, it's quite normal and is called a message with a "null return path".

Is it? I thought the return path (or envelope sender) was quite distinct
from the From: header in the message itself.
Bounce messages usually have From: headers (normally showing
[EMAIL PROTECTED]).

A blank From: header is possible, but it's unusual in normal mail from MUAs.

Chris


Re: How can i block this?

2005-10-12 Thread Chris Lear
* mouss wrote (10/12/05 13:13):
> Chris Lear a écrit :
> 
>>* Matt Kettler wrote (10/11/05 19:37):
>>  
>>
>>>Alessio wrote:
>>>
>>>
>>>>I have received this mail, the heading "from" is blank! Is possible? 
>>>>  
>>>>
>>>Yes, it's quite normal and is called a message with a "null return path".
>>>
>>>
>>
>>Is it? I thought the return path (or envelope sender) was quite distinct
>>from the From: header in the message itself.
>>Bounce messages usually have From: headers (normally showing
>>[EMAIL PROTECTED]).
>>
>>A blank From: header is possible, but it's unusual in normal mail from MUAs.
>>
>>  
>>
> while the OP seems confused (he said: heading "from"), his logs show he 
> is talking about the envelop sender ("from=<>" of his sendmail or whatever).

I see. Sorry.

Chris


SARE_URI_EQUALS false positives

2005-12-23 Thread Chris Lear
I'm getting false positives for SARE_URI_EQUALS, which scores 5 and is
therefore skewing the scoring of some mail quite badly.
The weird thing is that the uris that spamassassin is complaining about
aren't uris at all. The mail in question is auto-created reports of cvs
diffs, so it's slightly unusual.
I've tried to condense the debug information. Here it is:

This is some of the output from spamassassin -D http://updated.by=Mis
[16733] dbg: uri: cleaned parsed uri, updated.by=Mis
[16733] dbg: uri: parsed uri found, http://updated.by=Mis
[16733] dbg: uri: cleaned parsed uri, http://updated.by=Mis
[16733] dbg: uri: parsed uri found, updated.by=Updated
[16733] dbg: uri: cleaned parsed uri, updated.by=Updated
[16733] dbg: uri: cleaned parsed uri, http://updated.by=Updated
[16733] dbg: uri: parsed uri found, http://updated.by=Updated
[16733] dbg: uri: cleaned parsed uri, http://updated.by=Updated

These "parsed uris" are not links in the e-mail. They are just text.

I've had a bit of a look at the regexps that spamassassin uses to work
out what is a uri, and it seems that "updated.by=Updated" is treated as
a uri because .by is a valid tld and spamassassin looks for "schemeless"
uris, then prepends http:// for the tests.

I'm running spamassassin 3.1.0 on perl 5.8.2.

Does anyone have any suggestions, apart from simply reducing the score
for SARE_URI_EQUALS? Is this a spamassassin bug, or is there no way to
guarantee that only real uris are parsed as such?

Chris


Re: SARE_URI_EQUALS false positives

2005-12-23 Thread Chris Lear
* jdow wrote (23/12/05 11:26):
> From: "Chris Lear" <[EMAIL PROTECTED]>
> 
>> I'm getting false positives for SARE_URI_EQUALS, which scores 5 and is
>> therefore skewing the scoring of some mail quite badly.
>> The weird thing is that the uris that spamassassin is complaining about
>> aren't uris at all. The mail in question is auto-created reports of cvs
>> diffs, so it's slightly unusual.

[...]
>> 
>> I've had a bit of a look at the regexps that spamassassin uses to work
>> out what is a uri, and it seems that "updated.by=Updated" is treated as
>> a uri because .by is a valid tld and spamassassin looks for "schemeless"
>> uris, then prepends http:// for the tests.
>> 
>> I'm running spamassassin 3.1.0 on perl 5.8.2.
>> 
>> Does anyone have any suggestions, apart from simply reducing the score
>> for SARE_URI_EQUALS? Is this a spamassassin bug, or is there no way to
>> guarantee that only real uris are parsed as such?
> 
> Before you drop the score precipitously check if there is some other
> characteristic of the emails that trigger falsely which can be used to
> apply a negative score. If there is such a characteristic then generate
> the appropriate negative score. If not weigh how effective the rule is
> for you. The version of "sa-stats.pl" that is on the SARE site helps
> figure this out nicely.
> 
> That said it's close to a "50/50" rule that hits on very few messages
> here so should have a low score. (It hit on 6 messages out of 75000.)
> Cutting it out completely here seems like it would be effective TODAY.
> That could change. At one time it was quite necessary. Spammer fads
> change.)

I've reduced the score, and a quick check shows that that rule hits
almost nothing anyway, so it's not a big problem. The bayes rules were
keeping the false positives from doing much damage, anyway.
But spamassassin uses uris for lots of things, and if it's commonly
parsing (reasonably) normal text as uris, I would expect that to be a
problem in more rules than just SARE_URI_EQUALS.

Chris


Re: SARE_URI_EQUALS false positives

2005-12-23 Thread Chris Lear
* jdow wrote (23/12/05 12:06):
> From: "Chris Lear" <[EMAIL PROTECTED]>
>>* jdow wrote (23/12/05 11:26):
>>> From: "Chris Lear" <[EMAIL PROTECTED]>
>>> 
>>>> I'm getting false positives for SARE_URI_EQUALS, which scores 5 and is
>>>> therefore skewing the scoring of some mail quite badly.
>>>> The weird thing is that the uris that spamassassin is complaining about
>>>> aren't uris at all. The mail in question is auto-created reports of cvs
>>>> diffs, so it's slightly unusual.
>> 
>> [...]
>>>> 
>>>> I've had a bit of a look at the regexps that spamassassin uses to work
>>>> out what is a uri, and it seems that "updated.by=Updated" is treated as
>>>> a uri because .by is a valid tld and spamassassin looks for "schemeless"
>>>> uris, then prepends http:// for the tests.
>>>> 
>>>> I'm running spamassassin 3.1.0 on perl 5.8.2.
>>>> 
>>>> Does anyone have any suggestions, apart from simply reducing the score
>>>> for SARE_URI_EQUALS? Is this a spamassassin bug, or is there no way to
>>>> guarantee that only real uris are parsed as such?
>>> 
>>> Before you drop the score precipitously check if there is some other
>>> characteristic of the emails that trigger falsely which can be used to
>>> apply a negative score. If there is such a characteristic then generate
>>> the appropriate negative score. If not weigh how effective the rule is
>>> for you. The version of "sa-stats.pl" that is on the SARE site helps
>>> figure this out nicely.
>>> 
>>> That said it's close to a "50/50" rule that hits on very few messages
>>> here so should have a low score. (It hit on 6 messages out of 75000.)
>>> Cutting it out completely here seems like it would be effective TODAY.
>>> That could change. At one time it was quite necessary. Spammer fads
>>> change.)
>> 
>> I've reduced the score, and a quick check shows that that rule hits
>> almost nothing anyway, so it's not a big problem. The bayes rules were
>> keeping the false positives from doing much damage, anyway.
>> But spamassassin uses uris for lots of things, and if it's commonly
>> parsing (reasonably) normal text as uris, I would expect that to be a
>> problem in more rules than just SARE_URI_EQUALS.
> 
> That is a standalone rule.
> 
> And I do note that many of the SARE rules have severe problems in very
> specific cases. There are some mailing lists that are not well filtered
> for spam which have postings which trigger some of the "too effective
> to toss" SARE rules. I've developed some massive meta rules to at least
> partially get a handle on the problem. (A number of times XXX hit option
> would be nice to have for this.)

Sorry to go on, but I wonder whether you've missed by point. The
SARE_URI_EQUALS rule is working fine. It just looks in the uris that
spamassassin gives it, and complains when they contain "=".
The problem is that spamassassin is treating things that aren't uris as
uris. So SARE_URI_EQUALS is working on dud data.

In this specific case, the e-mail contains the text
"updated.by=Updated". This is not a uri, and nor should it be treated as
one. But spamassassin thinks it is (becasue .by is a valid tld), so, as
far as I can tell, *all* uri rules will check it. It so happens that
SARE_URI_EQUALS hits in this case, but other uri rules are vulnerable to
false positives if the uri parsing is wrong, aren't they?

Chris


Re: SARE_URI_EQUALS false positives

2006-01-03 Thread Chris Lear
* Loren Wilton wrote (24/12/2005 00:23):
>> Does anyone have any suggestions, apart from simply reducing the score
>> for SARE_URI_EQUALS? Is this a spamassassin bug, or is there no way to
>> guarantee that only real uris are parsed as such?
> 
> Several.

Hi. Thanks for the response. I'm replying rather late due to pressures
of Christmas.

> 
> 1.Change your report generator to remove the extraneous dot between
> updated and by.  Or change it to the more common underscore, if you insist
> on these words being connected for some reason.
> 
> 2.Put spaces around the equal sign.

These are fine suggestions, but sadly not practical. The e-mails are
auto-generated diffs from cvs commits. The files being committed are
java properties files. In particular, the "updated.by" property contains
internationalised versions of the phrase "Updated by". The "more common
underscore" would be unusual in the java properties file, and expecting
the developers to change the way they work to avoid SARE misfires is a
slightly overzealous reaction to the spam problem, I think. However, it
is possible if there's no sensible alternative.
The second suggestion is only a workaround, not a fix, anyway, because
spamassassin will still check http://updated.by as a uri.

> 
> 3.If you are reluctant for the correct fix, drop the score on the
> uri_equals rule to 4 or maybe 3, depending on what else your report manages
> to hit.

I am reluctant to use the "correct fix". Actually I'm inclined to think
that the word "correct" is being misapplied here. I've changed the
scores appropriately, though.

> 
> 4.You could submit a Bugzilla on the parsing of that phrase.  But
> frankly I consider the bug in the report generation, not SA's parsing of
> strange syntax.

The reason I didn't submit a bug was that I was not sure there was one -
hence the original query. And I'm still not going to submit a bug,
because I'm persuaded that there is not one. What bothered me (and still
does a bit) was that the string "updated.by=anything" matches a rule
that looks for uris of the form "http(s)://*=*". Ie the http(s) is
conjured out of nowhere for schemeless uris. I can see the point, but I
thought it would be worth bringing a possible problem to light. It's a
possible problem, not a bug per se, and the subsequent discussion shows
that people take different views on the seriousness of this kind of
parsing issue. One thing that hasn't been mentioned in respect of this
is that if spamassassin is looking aggressively for schemeless uris, it
could in some cases create quite a lot of unwanted uri checking traffic.

I'm happy to stick with what I've got now. I've sent some examples off
as indicated so that the SARE corpus will contain my mail in future.

Chris


Re: Another URL obfuscation

2006-01-10 Thread Chris Lear
* Jeff Chan wrote (10/01/2006 15:42):
> On Tuesday, January 10, 2006, 6:17:38 AM, Larry Rosenbaum wrote:
>> I found this obfuscated URL in a drug spam:
> 
>> http://gozifo> .upze5otbbutzanbb655k685ys5nn%2Eridgykh=
> com">>
> 
> Good grief, does any mail client actually parse that as a
> functional URI?

Yes. In your e-mail, my Thunderbird created a clickable link to
http://gozifo
My IE gives a DNS error when it tries that address.
My FireFox redirects to
http://www.google.com/search?btnI=I%27m+Feeling+Lucky&ie=UTF-8&oe=UTF-8&q=gozifo
which in turn redirects to http://www.vojir.com/other/basic-myebol.html
which gives a 404 error. It's probably possible to turn this
(mis)feature off in FireFox, but there it is by default.

I have no idea whether this is the original intention of the
obfuscation. I would guess not - and if it's viewed as html to start
with that might make a difference.

Chris


Re: Could you scan your logs for me?

2006-02-03 Thread Chris Lear
* Ole Nomann Thomsen wrote (03/02/06 09:27):
> Hi, can I ask a small favor from some of you running SA with Bayes enabled:
> Please run the following perl-oneliner on your SA-log (mine is "current"):
> 
> perl -ne 'if (/result:/) {$n++; $b++ if (/BAYES/);} } print $b/$n,"\n"; {' <
> current
> 
> (I promise it's not a rootkit :-)
> 
> I get:
> 0.710109622411693
> 
> I suspect you really ought to see 1, always. What do you get?

0.960777058279371

In my case, the difference is attributable to this in local.cf:

bayes_ignore_to users@spamassassin.apache.org
whitelist_to users@spamassassin.apache.org

Chris


Re: Easyjet e-mail scoring very high

2007-01-08 Thread Chris Lear
* Chris Lear wrote (01/12/06 16:57):
> * Adam Stephens wrote (01/12/06 16:10):
>> Chris Lear wrote:
>>> * Loren Wilton wrote (01/12/06 14:54):
>>>   
>>>>> The html contains this sort of thing:
>>>>> http://www.easyjet.com/EN/Members/
>>>>>
>>>>> Which looks like the culprit. In fact, every full stop in the html is
>>>>> represented as . for some reason.
>>>>>
>>>>> Still wondering though... how do you solve a problem like EasyJet?
>>>>>   
>>>> Sure looks like spam to me.  ;-)
>>>>
>>>> Which also looks like just about every airline message I've seen from any 
>>>> airline.  :-(  Apparently they hired spammers to design their marketing 
>>>> campain mail.
>>>>
>>>> You could try sending to mostmaster or whatever at whichever marketing 
>>>> company is really sending that mail and see if you can get any attention 
>>>> from them.  Probably not, but it might be worth trying.
>>>> 
>>>
>>> The trouble is, it's not marketing. It's a confirmation of a flight
>>> booking, which I paid for. The airline doesn't issue tickets. So it's
>>> something I genuinely want in my inbox. It looks like it's generated
>>> directly by the easyjet.com web server.
>>>   
>> 
>> I had some complaints about that this week; it's obviously a new issue, 
>> and it looks like it only applies to the ticket confirmations. Since 
>> people really need these booking confirmations I've whitelisted it - 
>> using a whitelist_from_rcvd rule seems to catch the booking 
>> confirmations only as the marketing material is sent from a different 
>> machine.
> 
> Thanks for all the advice. I've reluctantly whitelisted them and written
> a polite message to [EMAIL PROTECTED] It doesn't seem to have
> bounced, so maybe someone will read it. I'll let you know if I get a
> response.
> Meanwhile, I suppose this is something for others to be aware of if you
> run an mta that rejects on high SA scores (and have users that might
> want to fly EasyJet).

This thread is ancient now, but here's a followup: I never got a
response from Easyjet, but I did get (today) a replica of the original
e-mail. It's almost identical (same appalling html, still from
savvis.net, but from a different ip), but missing a chunk of advertising
(hotels, car rental, etc), and with some very slightly different wording
about hand luggage.

The new version hits these rules:

DNS_FROM_RFC_ABUSE,
FORGED_RCVD_HELO, [this is new]
HTML_FONT_FACE_BAD,
HTML_MESSAGE,
HTML_TINY_FONT,
MIME_HTML_MOSTLY,
SARE_OBFU_AMP2B,
SARE_SPEC_LEO_LINE03a,
USER_IN_WHITELIST [because I whitelisted them]

DNS_FROM_RFC_ABUSE
HTML_FONT_FACE_BAD
HTML_MESSAGE
HTML_TINY_FONT
MARKETING_PARTNERS [This has gone]
MIME_HTML_MOSTLY
MPART_ALT_DIFF [This has gone]
SARE_OBFU_AMP2B
SARE_SPEC_LEO_LINE03a

Chris


Re: Techworld says "spam shows sudden slide'?

2007-01-12 Thread Chris Lear

Tony Finch wrote:

On Thu, 11 Jan 2007, Michael Scheidell wrote:


I don't think I see any sudden drop, was the worlds #1 spammer in that
hut in fluga that got bombed last night?


I haven't seen any drop recently either. For my systems (daily legit
volume 300,000 and spam 10x that) the spam peak was in the first half of
November and levels have been fairly constant (but with a level slightky
lower than the peak) since then.


I noticed a significant (absolute) drop towards the end of November. I 
put it down to a change of tactics: a reduction in the number of 
repeat-the-same-message-with-small-differences spam. These were 
previously skewing our stats upwards, because effectively the same spam 
from the same machine was being sent ~10-15 times to the same user with 
small text changes (we were rate-limiting connections to reduce the SA 
cost). This seems to be rarer now, or maybe even abandoned as a 
technique by spammers.


Chris


Re: complete false hits for BASE64 and LW_STOCK_SPAM4

2007-02-09 Thread Chris Lear
* Loren Wilton wrote (08/02/07 19:46):
>> As for LW_STOCK_SPAM4, it's being triggered by the fact that the message
>> is base-64 encoded text AND has a Date: header that's missing a proper
>> timezone. Apparently a batch of stock spam went out at some point with
>> both of these abnormal features. I have to admit, it's a pretty rare
>> combination.
>>
>>> Date: February 6, 2007 9:52:29 AM PST
>>
>> That should, properly, should read something like this:
>>   Date: Wed, 06 Feb 2007 09:52:29 -0800
> 
> Actually LW_STOCK_SPAM4 was written on 02/19/2006, and is looking for a 
> Base64 encoded message that has a valid timezone that is specifically 
> "\s\+", not an invalid time zone.
> 
> Internally I have it scored at 5 points and haven't had a problem with it, 
> but people don't send me messages from Blackberrys.
> 
> I suppose a blackberry might not have a clock so send all messages as though 
> they came from London regardless of where they are.  That would somewhat 
> surprise me, since cell phones certainly know where they are and what time 
> it is.  But if Verizon is involved then it is certainly possible that the 
> software has been deliberately crippled in a number of ways, and creating a 
> proper date header might be one of those deliberate malfunctions.


Just to confirm that this unmodified rule does hit some legit blackberry
e-mail, here's an example (apologies for the obfuscation, but I've only
messed with addresses. It's not my e-mail):

Return-path: 
Envelope-to: 
Delivery-date: Wed, 07 Feb 2007 17:21:42 +
Received: from smtp02.bis.eu.blackberry.com ([216.9.253.49])
by mail.barcombe.net with esmtp (Exim 4.63)
(envelope-from )
id 1HEqUG-0008Ku-IV
for my wife's address; Wed, 07 Feb 2007 17:21:41 +
Message-ID:
<[EMAIL PROTECTED]>
Content-Transfer-Encoding: base64
Reply-To: the sender
References: <[EMAIL PROTECTED]>
In-Reply-To: <[EMAIL PROTECTED]>
Sensitivity: Normal
Importance: Normal
To: "My Wife" 
Subject: Re: 25th august
From: the sender
Date: Wed, 7 Feb 2007 17:22:58 +
Content-Type: text/plain; charset="Windows-1252"
MIME-Version: 1.0
X-AntiVirus: Clean
X-Spam-Score: 2.1
X-Spam-Level: ++
X-Spam-Report: Barcombe.net spam report: Score = 2.1.
Tests=BAYES_00=-2.599,LW_STOCK_SPAM4=1.66,MIME_BASE64_NO_NAME=0.224,MIME_BASE64_TEXT=1.885,NO_REAL_NAME=0.961

A bit of grepping suggests that LW_STOCK_SPAM4 has hit 5 ham and 3 spam
(all scoring 20+) on that server since about November. So its usefulness
is perhaps questionable. Normal disclaimer applies: this is only one
low-traffic server. I live in the UK which might make the + timezone
more likely.

[Also see the thread "Blackberry email"]

Chris (whose mail from blackberries has all been received OK)


Re: New stock spam (2/14/07)

2007-02-15 Thread Chris Lear

* Jonathan Nichols wrote (15/02/07 05:19):

Maciej Friedel wrote:

On 02/14/07 Jonathan wrote:


http://www.pbp.net/~jnichols/spam2.txt

0.0 BOTNET_NORDNS IP address has no PTR record
0.1 HTML_50_60 BODY: Message is 50% to 60% HTML  
0.0 HTML_MESSAGE BODY: HTML included in message
1.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% 
[score: 0.5002]

5.0 BOTNET The submitting mail server looks like part of a Botnet

i think botnet is a good idea

maciek



I thought botnet was unstable.. is it working ok now?


It's not (in my experience) unstable. It's excellent. But the default 
score of 5 is way too high. It gets a lot of false positives, especially 
(again, in my experience) from small mail-order operations who don't 
understand dns (Exchange users, I rather uncharitably assume). I score 
botnet at 2 and I'm very happy with it.
I reckon better network tests are the future of spam filtering, now that 
spammers are sending blocks of text from Harry Potter books along with 
undetectable URLs containing spaces etc.


Chris


Re: Rules report

2007-04-19 Thread Chris Lear

* Matt Kettler wrote (19/04/07 14:49):

Matt Kettler wrote:

If you try to build it off a live feed and use SA's marking as the spam
criteria, your statistics are useless. Any rule with a high enough score
would get "perfect" results.. all the mail it matched would be spam, and
no nonspam. You have, essentially, created a "self fulfilling prophecy".
The higher-scoring a rule is, the more likely messages that match it
will be tagged as spam, even if they're not really spam.
  

Self correction. Such stats aren't "useless", it depends on what you
want out of them.

If you want to know how accurate a particular rule is, by comparing the
spam vs nonspam hit rates, those stats are useless, because of the bias.
You need a manually sorted corpus to get this kind of information.

If you want to see which rules are getting used a lot, vs those that are
rarely getting used, these stats are quite useful.

If you want a "top x rules" list, sa-stats can do that for you:

http://www.rulesemporium.com/programs/sa-stats.txt


http://www.rulesemporium.com/programs/sa-stats-1.0.txt is probably a bit 
better in this case.




It will parse a spamd logfile and report the most-frequently used spam
and nonspam rules (and you can configure how many it will list for each)


The 1.0 version can do per-domain and per-user info, given a 3.1 log.

Chris


Re: URIBL_BLACK matching on messages with no URLs in them...

2007-07-02 Thread Chris Lear

Jo Rhett wrote:
Note: yes, uribl has their own mailing list.  That server has been down 
for quite some time, so I gave up and posted it here in case someone is 
dual listed and can fix it.


There's no URL in this message.  What is it mis-matching against?


This has been answered, but, if you're still interested, also see 
http://marc.info/?l=spamassassin-users&m=113533589419731&w=2 with 
details of a similar problem.


Chris


Re: Simply don't run spam for Mailing Liste

2005-04-28 Thread Chris Lear
* arnaud wrote (27/04/2005 23:06):
> Kris Deugau wrote:
> 
[...]
>> In my case, for instance, SA is called from procmail just before the
>> message is written to a mailbox.  In my .procmailrc file, I have a
>> number of procmail recipes that look something like this:
>> 
>> # SATalk
>> :0:
>> * ^List-Id: 
>> /home/kdeugau/mail/spam-stomping
>> 
>> This one files messages from this list in the spam-stomping folder
>> before SA even sees the message.  I have quite a long list of similar
>> entries for other mailing lists.
>> 
>> -kgd
> 
> Ok Thank you. As your can see, i haven't understand this option. I use 
> exiscan with exim. It would be better i suppose to perform spamassassin 
> with procmail that i use too.

Or use exim configuration rules to prevent scanning of certain messages.
If you are using exim's acls (either exim 4.50+ or older exim with the
exiscan-acl patch), something like this should work:


[in main config]
acl_smtp_rcpt = acl_check_rcpt
acl_smtp_data = acl_check_content

[in acls]
acl_check_rcpt:
[...]
# Set acl_m0 variable to tell the later acl not to use SA
accept hosts = veronyk.net : freetelecom.com
  set acl_m0 = dontcheckdata

[...]

acl_check_content:
# Skip all content checks if acl_m0 variable set
  accept condition = ${if eq{$acl_m0}{dontcheckdata}{1}{0}}
[...]
  deny  message = I don't like your nasty spam
spam = spamd:true/defer_ok
condition = ${if >{$spam_score_int}{80}{1}{0}}
[...]


Re: OT: The highest score?

2005-05-04 Thread Chris Lear
* Chris wrote (05/04/05 01:27):
> On Sunday 01 May 2005 04:49 pm, John Andersen wrote:
>> On Sunday 01 May 2005 02:02 am, Roman Serbski wrote:

[...]

> *  104 SARE_FORGED_EBAY Message appears to be forged, (ebay.com)

[...]

The SARE_FORGED_* rules are a good way to score over 100 points quickly.
When I first installed SARE I had some very high-scoring *ham* (hitting
SARE_FORGED_CITI. The SARE people don't work in the banking sector, it
seems), and, as a result, some crazy AWL scores afterwards. I've removed
the SARE forged rules now altogether, and most of the remaining spam
scores under 50 (just one 52.9 yesterday).

Chris


Re: OT: Confession and rage

2005-05-06 Thread Chris Lear
* Stewart, John wrote (05/06/05 15:55):

[... excellent story chopped ...]

> Do I:
> 
> - Never go there again, as I said would be the case in my previous email?
> 
> - Show up and try to convince her what a horrible thing she is doing?
> 
> - Just screw with their (horribly insecure) online site, signing up for
> appointments all day for Elmer Fudd, etc?
> 
> - Simply ban their domain from my mailserver and report them to the RBLs?

Or...

- Offer them some consultancy, in return for a haircut (is this the same
as option 2?)

-- Chris


Re: how to config SA to scan mail from localhost

2005-05-10 Thread Chris Lear
* Evan Platt wrote (10/05/2005 05:21):
> At 09:16 PM 5/9/2005, you wrote:
>>I'm testing the SA but my server can't connect to outside world. Thus, 
>>i've to send mail from localhost to myself to find how accurate SA is.
>>Unfortunately, SA don't scan mails that sent from localhost.
>>
>>how can I reconfig it to scan every mail.
> 
> You don't. You tell spamassassin what mail to scan. How are you calling 
> spamassassin, and what is your mail configuration? 
> 

The original question is a restatement of yesterday's "how to force SA
to scan mail that send from php" post.

My reading of the situation (which might be wrong) is this:

The Original Poster wants to do some sort of project that will give
statistics on the accuracy of spamassassin. He has followed a recipe
that installs qmail with qmail-scanner, and has got a php script that
will send mail to the mail server. But the mail server appears to skip
the scan for local messages, so the project is getting no statistics.

The solution to this problem is to work out how qmail-scanner decides
what to scan, and change it. Unfortunately, I can't help there. I would
try doing a manual smtp connection from the local machine (telnet
localhost 25) and take it from there.

But my worry is that sending a load of e-mail via a php form will
produce hopeless project results, because it will effectively only test
the value of spamassassin's body checks. But perhaps that's part of the
plan.

--
Chris


SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251

2005-05-20 Thread Chris Lear
I've been running quite a lot of sare rules on a site-wide SA
installation for a month or two now. I've been keeping a fairly close
eye on it, and there have been few false positives generally.

But today I noticed that several e-mails are hitting both
SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251. These are ham, sent from
(one specific address in) Ukraine to a Ukrainian in England, written in
English.
The scoring is such that the e-mail gets a score of 3.333 PLUS 4.0 - so
only bayes saves it from being rejected (we reject at >5.5).

I can re-score these rules (or remove sare_header0, which will lower the
scores anyway), but I have 2 questions:
- Is this a slightly unfair double-scoring?
- Are there any other similar rules I should worry about, given that
some Russian mail to this server is ham?

--
Chris


Re: SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251

2005-05-20 Thread Chris Lear
* John Wilcock wrote (05/20/05 10:51):
> Chris Lear wrote:
>> But today I noticed that several e-mails are hitting both
>> SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251. These are ham, sent from
>> (one specific address in) Ukraine to a Ukrainian in England, written in
>> English.
>> The scoring is such that the e-mail gets a score of 3.333 PLUS 4.0 - so
>> only bayes saves it from being rejected (we reject at >5.5).
>> 
>> I can re-score these rules (or remove sare_header0, which will lower the
>> scores anyway), but I have 2 questions:
>> - Is this a slightly unfair double-scoring?
>> - Are there any other similar rules I should worry about, given that
>> some Russian mail to this server is ham?
> 
> These are actually in the header1 file, not header0, but surely they 
> ought to be moved to the 70_sare_header_eng.cf as they hit non-English 
> ham. Bob?

They're in my header0.cf from sare/rules du jour. And in header.cf with
a lower score as well. Have I got the wrong files?

RulesDuJour $ grep SARE_FROM_CHAR_W1251 *
70_sare_header.cf:headerSARE_FROM_CHAR_W1251 From:raw =~
/\=\?Windows-1251\?/i
70_sare_header.cf:describe  SARE_FROM_CHAR_W1251 Displays in
unexpected charset
70_sare_header.cf:score SARE_FROM_CHAR_W1251 1.666
70_sare_header.cf:#ham  SARE_FROM_CHAR_W1251 Found in some
Russian ham
70_sare_header.cf:#hist SARE_FROM_CHAR_W1251 Created by Bob
Menschel May 17 2004
70_sare_header.cf:#counts   SARE_FROM_CHAR_W1251 245s/4h of 238550
corpus (112525s/126025h RM) 02/28/05
70_sare_header.cf:#counts   SARE_FROM_CHAR_W1251 640s/0h of 54176
corpus (16997s/37179h JH-3.01) 02/01/05
70_sare_header.cf:#counts   SARE_FROM_CHAR_W1251 0s/0h of 17050
corpus (14617s/2433h MY) 08/08/04
70_sare_header0.cf:headerSARE_FROM_CHAR_W1251 From:raw =~
/\=\?Windows-1251\?/i
70_sare_header0.cf:describe  SARE_FROM_CHAR_W1251 Displays in
unexpected charset
70_sare_header0.cf:score SARE_FROM_CHAR_W1251 4.000
70_sare_header0.cf:#stypeSARE_FROM_CHAR_W1251 spamgg
70_sare_header0.cf:#hist SARE_FROM_CHAR_W1251 Created by Bob
Menschel May 17 2004
70_sare_header0.cf:#counts   SARE_FROM_CHAR_W1251 180s/0h of 66979
corpus (41757s/25222h RM) 09/04/04
70_sare_header0.cf:#counts   SARE_FROM_CHAR_W1251 209s/0h of 38398
corpus (14914s/23484h JH) 08/14/04 TM2 SA3.0-pre2
70_sare_header0.cf:#counts   SARE_FROM_CHAR_W1251 0s/0h of 17050
corpus (14617s/2433h MY) 08/08/04


--
Chris


Re: SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251

2005-05-20 Thread Chris Lear
* John Wilcock wrote (05/20/05 12:15):
> Chris Lear wrote:
>> They're in my header0.cf from sare/rules du jour. And in header.cf with
>> a lower score as well. Have I got the wrong files?
> 
> Methinks you have an old header0.cf that is no longer being updated - 
> these rules aren't in the current header0 on rulesemporium.com.

OK, thanks. I'll try to find out what's wrong with my Rules du Jour.

> 
> And in any case you shouldn't be using header and header0 together...

I didn't know that. I'll fix that as well.

Thanks for your help.

--
Chris


Re: SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251

2005-05-20 Thread Chris Lear
* Robert Menschel wrote (05/20/05 15:13):
> Hello Chris, John,
> 
> Friday, May 20, 2005, 3:47:55 AM, you wrote:
> 
 I can re-score these rules (or remove sare_header0, which will lower the
 scores anyway), but I have 2 questions:
 - Is this a slightly unfair double-scoring?
 - Are there any other similar rules I should worry about, given that
 some Russian mail to this server is ham?
>>> 
>>> These are actually in the header1 file, not header0, but surely they
>>> ought to be moved to the 70_sare_header_eng.cf as they hit non-English
>>> ham. Bob?
> 
> CL> They're in my header0.cf from sare/rules du jour. And in header.cf with
> CL> a lower score as well. Have I got the wrong files?
> 
> Yes, your header0 is old.  Both rules are in header1 in the current
> versions. You need to fix your RDJ for header0, or just delete it,
> since header0 through header3 are included in header.cf
> 
> Yes, you can and maybe should provide a lower score, at least
> temporarily.
> 
> Yes, they should be moved to header_eng, and will be this weekend.

Thanks for all this. I've been educated.

> 
> Meanwhile, is it possible for you to send me some samples of the ham?
> If I add that to my corpus, it'll be taken into account in the next
> rescoring.

Sent under separate cover.

--
Chris


Re: Unsubscribing

2005-07-15 Thread Chris Lear
* Duane Hill wrote (07/15/05 10:49):
> On Friday, July 15, 2005 at 9:45:17 AM, [EMAIL PROTECTED] confabulated:
> 
>> I am shortly to go on hols for 2 weeks and so was planning to
>> unsubscribe until I get back. I notice on the web page at
>> http://wiki.apache.org/spamassassin/MailingLists
> 
>> it tells you how to subscribe
> 
> And in the headers of all messages to the list state this:
> 
> list-help: 
> list-unsubscribe: 
> List-Post: 

Which helps. The OP's suggestion was...

>> [...] I would like to suggest that
>> unsubscribe details be added to the page.

I think this is a reasonably sensible suggestion.

>> I also notice that I seem to
>> be subscribed to two spamassassin lists, not sure how that happened,

And you seem to have sent mail to both at once, resulting in a
duplicate. I think that spamassassin-users@incubator.apache.org is out
of date.

>> probably user stupidity knowing me. Is there information somewhere else
>> that tells people how to unsubscribe from the list.

See the headers (as mentioned above)

--
Chris


Re: How can I correct this FalsePositive?

2005-07-15 Thread Chris Lear
* Loren Wilton wrote (07/15/05 12:02):
>> X-Spam-Status: Yes, score=2.2 required=2.0
> tests=HTML_BACKHAIR_8,HTML_MESSAGE,
>> HTML_OBFUSCATE_05_10,MIME_HTML_ONLY autolearn=no version=3.0.4
> 
> The easiest way to eliminate this FP would be to take your spam threshold
> back to 5, or at least something close to that.  The rules that hit on this
> mail have nothing whatever to do with the site - they are related to the
> mail message formatting.
> 
> Since it only got 2.2 points, nobody should really notice this.  But since
> you have set your spam cutoff way too low, it FPs for you.

...and the cheapest way to fix the message formatting, as I see it, is
to get them to fix the message so it doesn't hit this rule:

1.2 MIME_HTML_ONLY BODY: Message only has text/html MIME parts

Which should also make the message more friendly to non-HTML mail
readers, which is worthwhile anyway. And it will take the score down to 1.0.

--
Chris