Re: How to delete emails with FROM that is not in the server?

2012-08-15 Thread David B Funk

On Wed, 15 Aug 2012, Sergio wrote:


Hello all,
wondering if there could be a rule where the email that is delivered from the 
server could be checked the FROM that the domain exist on the server, Is it 
possible?

What I am looking is to block any email that is send from my server that is not 
using any of the domain accounts that belongs to that server.

Thank you in advance.

Best Regards,

Sergio Cabrera


That sort of check is best done at the SMTP-server (MTA) level. How is SA
to know who are the valid users on your system (including aliases, 
forwards, etc).


Your SMTP server must know who your valid recipients are so it can reject
unknown users and deliver the valid ones. So just apply the same kind of
check to the From address (IE if domain === us, check to make sure user ==
ours, else SMTP-REJECT). Details are MTA specific, but most have some kind
of built in check for doing this sort of thing.

The thing which SA can be used for is to hit forgery spam. IE if the 
'From' domain is ours, and the sending host isn't one we bless, hit it.

(If you have valid SPF records this is trivially easy to do).

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: How to delete emails with FROM that is not in the server?

2012-08-15 Thread John Hardin

On Wed, 15 Aug 2012, Sergio wrote:


Hello all,
wondering if there could be a rule where the email that is delivered from
the server could be checked the FROM that the domain exist on the server,
Is it possible?

What I am looking is to block any email that is send from my server that is
not using any of the domain accounts that belongs to that server.


That's not what SA is for.

Read up how to configure whatever your MTA is to prevent "open relay".

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Gun Control enables genocide while doing little to reduce crime.
---
 Today: the 67th anniversary of the end of World War II


How to delete emails with FROM that is not in the server?

2012-08-15 Thread Sergio
Hello all,
wondering if there could be a rule where the email that is delivered from
the server could be checked the FROM that the domain exist on the server,
Is it possible?

What I am looking is to block any email that is send from my server that is
not using any of the domain accounts that belongs to that server.

Thank you in advance.

Best Regards,

Sergio Cabrera


Re: SpamAssassin Hanging on RTF Attachments

2012-08-15 Thread John Hardin

On Wed, 15 Aug 2012, John Evans wrote:


On 2012-08-15 10:15, Kevin A. McGrail wrote:

 On 8/15/2012 11:24 AM, Henrik K wrote:

>  On Wed, Aug 15, 2012 at 11:14:58AM -0400, Kevin A. McGrail wrote:
> 
> >  Henrik, why don't you think the timeout hit?
> 
>  Probably because regexps hanging and it's impossible to timeout

>  them.
  Interesting. OK. I look forwarding to see if your patch helped!


Kevin,

I added the patch and it hung in the same place. The 'spamassassin -D -t < 
bad' command eventually went through after a LONG timeout. I didn't capture 
the results of the SA command (forgot to redirect output), but the patch to 
substr(X, 0, 3) didn't seem to help any. Could be that 30,000 is too 
long? That delves deep into the attachment, and I'm wondering if the top-end 
of the attachment (within that 30k range) has content that triggers a regex 
that doesn't like what it sees?


Just throwing some ideas out there.

Thoughts?


Add this to your .pre file and see if it helps isolate a specific 
poorly-performing rule:


  loadplugin HitFreqsRuleTiming ./HitFreqsRuleTiming.pm

Also do "--debug area=rules" instead of "-D", that may make the 
problematic rule more apparent.


And capture STDERR to a file! :)

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  But if there is no such inalienable right [to self defense], the
  entire nature of the social contract is changed. Each man’s worth
  is measured solely by his utility to the state, and as such the
  value of his life rides a roller coaster not unlike the stock
  market: dependent not only upon the preferences of the party in
  power but upon the whims of its political leaders and the
  permanent bureaucratic class.  -- Mike McDaniel
---
 Today: the 67th anniversary of the end of World War II

Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread John Hardin

On Wed, 15 Aug 2012, Kevin A. McGrail wrote:


On 8/15/2012 5:18 PM, John Hardin wrote:

 I might not go so far as to say autolearn should be disabled by default,
 as it is a major good if well trained; but setting the defaults extreme
 enough that it is reliably, if slowly, initially trained seems to me a
 fair middle ground. Setting the ham default threshold to -3 or even -5
 seems prudent (_much_ better than the current 0.1), then someone who
 actually wants to configure it can adjust based on how well it's
 performing and whether they want autolearn on at all.


Can you open a bug about that and let's see if we can get that done? I agree 
that a slower training threshold makes sense.


https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6828

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 Today: the 67th anniversary of the end of World War II


Re: SpamAssassin Hanging on RTF Attachments

2012-08-15 Thread John Evans

On 2012-08-15 10:15, Kevin A. McGrail wrote:

On 8/15/2012 11:24 AM, Henrik K wrote:


On Wed, Aug 15, 2012 at 11:14:58AM -0400, Kevin A. McGrail wrote:


Henrik, why don't you think the timeout hit?


Probably because regexps hanging and it's impossible to timeout
them.

 Interesting. OK. I look forwarding to see if your patch helped!


Kevin,

I added the patch and it hung in the same place. The 'spamassassin -D 
-t < bad' command eventually went through after a LONG timeout. I didn't 
capture the results of the SA command (forgot to redirect output), but 
the patch to substr(X, 0, 3) didn't seem to help any. Could be that 
30,000 is too long? That delves deep into the attachment, and I'm 
wondering if the top-end of the attachment (within that 30k range) has 
content that triggers a regex that doesn't like what it sees?


Just throwing some ideas out there.

Thoughts?

--
John Evans


Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread RW
On Wed, 15 Aug 2012 17:05:00 -0400
Kevin A. McGrail wrote:

> On 8/15/2012 5:00 PM, John Hardin wrote:
> >
> > Right. It might be prudent to review the defaults before the next 
> > major release. 
> I wonder if we shouldn't disable auto-learning by default (assuming
> it's on by default)...
> 
> Bayes should really be trained.

It seems to me that bug 6344 from 2010 has some merit. (I was about to
file something similar myself.)

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6344

This suggests that lists like RCVD_IN_DNSWL_* should be marked as
noautolearn so that when they fail they don't screw-up autolearning- 
which is what appears to have happened here. This is exacerbated by the
fact that autolearning wont learn  against a strong Bayes result (quite
rightly), so damage can become permanent.



Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread Kevin A. McGrail

On 8/15/2012 5:28 PM, JP Kelly wrote:

Dumb question:
How can I set the autolearn thresholds?

perldoc Mail::SpamAssassin::Plugin::AutoLearnThreshold



 bayes_auto_learn_threshold_nonspam n.nn   (default: 0.1)
   The score threshold below which a mail has to score, to be 
fed into

   SpamAssassin's learning systems automatically as a non-spam
   message.

   bayes_auto_learn_threshold_spam n.nn  (default: 12.0)
   The score threshold above which a mail has to score, to be 
fed into

   SpamAssassin's learning systems automatically as a spam message.

   Note: SpamAssassin requires at least 3 points from the 
header, and

   3 points from the body to auto-learn as spam.  Therefore, the
   minimum working value for this option is 6.
Regards,
KAM


Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread Axb

On 08/15/2012 11:28 PM, JP Kelly wrote:

Dumb question:
How can I set the autolearn thresholds?

On Aug 15, 2012, at 15 2:18 PM, John Hardin  wrote:


Setting the ham default threshold to -3 or even -5 seems prudent (_much_ better 
than the current 0.1)





In local.cf

bayes_auto_learn_threshold_nonspam -3.0

# uncomment & change below if you want to raise or lower the spam 
learning threshold

#bayes_auto_learn_threshold_spam 15.0   

reload spamd or your glue.

h2h

Axb


Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread JP Kelly
Dumb question:
How can I set the autolearn thresholds?

On Aug 15, 2012, at 15 2:18 PM, John Hardin  wrote:

> Setting the ham default threshold to -3 or even -5 seems prudent (_much_ 
> better than the current 0.1)



Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread Kevin A. McGrail

  
  
On 8/15/2012 5:18 PM, John Hardin
  wrote:

On Wed, 15 Aug 2012, Kevin A. McGrail wrote:
  
  
  On 8/15/2012 5:00 PM, John Hardin wrote:


  
   Right. It might be prudent to review the defaults before the
  next major
  
   release. 

I wonder if we shouldn't disable auto-learning by default
(assuming it's on by default)...

  
  
  It is.
  
  
  Bayes should really be trained.

  
  
  I might not go so far as to say autolearn should be disabled by
  default, as it is a major good if well trained; but setting the
  defaults extreme enough that it is reliably, if slowly, initially
  trained seems to me a fair middle ground. Setting the ham default
  threshold to -3 or even -5 seems prudent (_much_ better than the
  current 0.1), then someone who actually wants to configure it can
  adjust based on how well it's performing and whether they want
  autolearn on at all.
  
  

Can you open a bug about that and let's see if we can get that done?
I agree that a slower training threshold makes sense.


-- 
  Kevin A. McGrail
  President
  
Peregrine Computer Consultants Corporation
3927 Old Lee Highway, Suite 102-C
Fairfax, VA 22030-2422
  
http://www.pccc.com/
  
703-359-9700 x50 / 800-823-8402 (Toll-Free)
703-359-8451 (fax)
kmcgr...@pccc.com
  
  
  

  



Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread John Hardin

On Wed, 15 Aug 2012, John Hardin wrote:

I might not go so far as to say autolearn should be disabled by default, 
as it is a major good if well trained;


Sorry, poor wording, I meant to say "as _Bayes_ is a major good if well 
trained".


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Watch... Wallet... Gun... Knee...-- Denny Crane
---
 Today: the 67th anniversary of the end of World War II


Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread John Hardin

On Wed, 15 Aug 2012, Kevin A. McGrail wrote:


On 8/15/2012 5:00 PM, John Hardin wrote:


 Right. It might be prudent to review the defaults before the next major
 release. 


I wonder if we shouldn't disable auto-learning by default (assuming it's on 
by default)...


It is.


Bayes should really be trained.


I might not go so far as to say autolearn should be disabled by default, 
as it is a major good if well trained; but setting the defaults extreme 
enough that it is reliably, if slowly, initially trained seems to me a 
fair middle ground. Setting the ham default threshold to -3 or even -5 
seems prudent (_much_ better than the current 0.1), then someone who 
actually wants to configure it can adjust based on how well it's 
performing and whether they want autolearn on at all.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Watch... Wallet... Gun... Knee...-- Denny Crane
---
 Today: the 67th anniversary of the end of World War II


Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread Kevin A. McGrail

On 8/15/2012 5:00 PM, John Hardin wrote:


Right. It might be prudent to review the defaults before the next 
major release. 
I wonder if we shouldn't disable auto-learning by default (assuming it's 
on by default)...


Bayes should really be trained.


Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread John Hardin

On Wed, 15 Aug 2012, Kris Deugau wrote:


John Hardin wrote:

I wasn't aware that autolearning could do a cold-start of Bayes, can
anyone confirm whether this is the case?


If you let it run long enough to pass the 200/200 ham/spam thresholds,
yes;  there's no distinction I've ever met about where the learning came
from.

That said, I wouldn't trust a pure autolearn setup with stock autolearn
thresholds - all too much spam will get learned scoring under 0.1.  :(


Right. It might be prudent to review the defaults before the next major 
release.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 An operating system design that requires a system reboot in order to
 install a document viewing utility does not earn my respect.
---
 Today: the 67th anniversary of the end of World War II


Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread Ben Johnson
On 8/15/2012 4:19 PM, Kris Deugau wrote:
> John Hardin wrote:
>> I wasn't aware that autolearning could do a cold-start of Bayes, can
>> anyone confirm whether this is the case?
> 
> If you let it run long enough to pass the 200/200 ham/spam thresholds,
> yes;  there's no distinction I've ever met about where the learning came
> from.
> 
> That said, I wouldn't trust a pure autolearn setup with stock autolearn
> thresholds - all too much spam will get learned scoring under 0.1.  :(
> 
> -kgd
> 

It's a bit disappointing to learn this (pardon the pun), given:

a.) This exchange between John Hardin and I, which occurred previously
in this thread:

---8<--

Me:

> Most of the list is probably laughing, but given the complexity of Spam
> Assassin, this crucial requirement was lost on me, amidst the sea of
> information and instructions. For example, there is no mention of the
> fact that SA is essentially useless without Bayesian training on
> http://wiki.apache.org/spamassassin/StartUsing .

John:

That's because that shouldn't be the case. The base ruleset + URIBL
should be very effective pretty much out-of-the-box.

---8<--

b.) The default value for bayes_auto_learn is 1 (on). (At least in my
particular distribution.)

Correct me if I'm wrong, but this issue's root cause seems to be that
bayes_auto_learn was on, out-of-the-box, yet I was not complementing its
efficacy via sa-learn.

Is this an accurate summary? Because if so, it seems prudent to change
the default bayes_auto_learn value to zero, and scorn any package
maintainer or developer who modifies it, or, alternatively, put a
banner, at font-size 100em, on the SpamAssassin homepage that issues an
unmistakable warning about Bayesian training's importance.

(John, I'll respond to your most recent message tomorrow most likely;
had enough for one day!)

Thank you,

-Ben


Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread Kris Deugau
John Hardin wrote:
> I wasn't aware that autolearning could do a cold-start of Bayes, can
> anyone confirm whether this is the case?

If you let it run long enough to pass the 200/200 ham/spam thresholds,
yes;  there's no distinction I've ever met about where the learning came
from.

That said, I wouldn't trust a pure autolearn setup with stock autolearn
thresholds - all too much spam will get learned scoring under 0.1.  :(

-kgd


Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread John Hardin

On Wed, 15 Aug 2012, Ben Johnson wrote:


On 8/15/2012 2:24 PM, John Hardin wrote:

On Wed, 15 Aug 2012, Ben Johnson wrote:


Some 99% of the spam that I receive, which is grossly spammy (we're
talking auto loans, cash advances, dink pills, the whole lot) contains
"BAYES_00=-1.9" in the tests portion of the X-Spam-Status header.

Might anyone know why?


Poor training.


John, I can't thank you enough for the thoroughness of your response.


I like to show off. :)


Apart from the Bayes score, what kind of scores are those spams getting?


Here are a few examples (the first two of which are two of VERY few in
which the BAYES_* value is over 00):

-
No, score=0.192 tag=-999 tag2=3 kill=13 tests=[BAYES_20=-0.001,
HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, RDNS_NONE=0.793,
SPF_PASS=-0.001, URIBL_DBL_SPAM=1.7] autolearn=no

No, score=2.241 tag=-999 tag2=3 kill=13 tests=[BAYES_20=-0.001,
HTML_MESSAGE=0.001, RCVD_IN_BRBL_LASTEXT=1.449, RDNS_NONE=0.793,
SPF_PASS=-0.001] autolearn=no

No, score=-0.836 tag=-999 tag2=3 kill=13 tests=[BAYES_00=-1.9,
HTML_MESSAGE=0.001, RCVD_IN_BRBL_LASTEXT=1.449, RCVD_IN_DNSWL_MED=-2.3,
RDNS_NONE=0.793, SPF_PASS=-0.001, URI_HEX=1.122] autolearn=no

No, score=1.256 tag=-999 tag2=3 kill=13 tests=[BAYES_00=-1.9,
HTML_MESSAGE=0.001, RCVD_IN_BRBL_LASTEXT=1.449, RCVD_IN_DNSWL_MED=-2.3,
RDNS_NONE=0.793, SPF_PASS=-0.001, URIBL_DBL_SPAM=1.7,
URIBL_RHS_DOB=1.514] autolearn=no
-


It might be interesting to see some log entries where autolearn=yes...

It bears mention that the RCVD_IN_DNSWL_MED test is having even more of 
a negative impact (pardon the pun) than BAYES_*. I am already working 
with the dnswl.org folks (off-list, for privacy reasons) to get to the 
bottom of that issue.


This might be a major contributing factor. If your system was taught from 
scratch by autolearn, and DNSWL (which is fairly well trusted) has been 
pushing a lot of spams to low scores...


You might want to set:
bayes_auto_learn_threshold_nonspam -3

That won't _fix_ the problem (at least not quickly) or avoid the need to 
wipe and retrain, but it might keep things from getting worse.


See perldoc Mail::SpamAssassin::Plugin::AutoLearnThreshold for more info.


Most of the list is probably laughing, but given the complexity of Spam
Assassin, this crucial requirement was lost on me, amidst the sea of
information and instructions. For example, there is no mention of the
fact that SA is essentially useless without Bayesian training on
http://wiki.apache.org/spamassassin/StartUsing .


That's because that shouldn't be the case. The base ruleset + URIBL should 
be very effective pretty much out-of-the-box.



What version of SA is this?


# spamassassin --version
SpamAssassin version 3.3.1
 running on Perl version 5.10.1


A little stale, but not bad.


You may also want to set up some mechanism for users to submit
misclassified messages for training. Depending on how much you trust
their judgement the learning from these can be automatic or can go
through you as a reviewer.


That sounds like a good idea. Is there a particular HOW TO or tutorial
that you recommend? If it depends on the environment/configuration, this
server runs Ubuntu 10.04 with Dovecot, Amavis, Sieve, and Spam Assassin.


I'm not sure, I don't lurk the Wiki much. About the best I can suggest is 
search the SA users mailing list archives for "training dovecot".


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The ["assault weapons"] ban is the moral equivalent of banning red
  cars because they look too fast.  -- Steve Chapman, Chicago Tribune
---
 Today: the 67th anniversary of the end of World War II


Re: RDNS_NONE

2012-08-15 Thread darxus
On 08/15, Matt wrote:
> I have messages marked as such:
> 
> RDNS_NONE Delivered to internal network by a host with no rDNS
> 
> Problem is they very clearly have reverse and matching forward DNS
> that Exim even agrees on.  Why is SA tagging them as such?

I wonder how much this is related to the other post I just made.  Exim is
notorious for allowing people to modify their Received headers in a way
that doesn't comply with anything.  Are they in headers SA is failing to
parse?  Run it through spamassassin -D.

-- 
"Safe is anywhere a hungry person can't walk in three days." - John Titor
http://www.ChaosReigns.com


Re: Received header syntax

2012-08-15 Thread darxus
On 08/15, Ori Bani wrote:
> I tried to intentionally make a terribly wrong Received to see if SA
> would give me a rule hit but it did not. Is there a rule for this? If
> so, how can I turn it on and off?

I don't think there is actually a rule for unparsable headers.  I think it
effectively just ignores received headers it can't parse.  So just run one
of your outgoing emails through spamassassin -D and look for lines like:

Aug 15 15:17:33.625 [23043] dbg: received-header: parsed as [ ip=140.211.11.3 
rdns=hermes.apache.org helo=mail.apache.org by=panic.chaosreigns.com ident= 
envfrom= intl=0 id=C6F0CCD227 auth= msa=0 ]

To make sure it has parsed successfully.

> Is there a place I can test only this rule?

No.

-- 
"I always wonder why birds stay in the same place when they can
fly anywhere on the earth.  Then I ask myself the same question."
- Harun Yahya
http://www.ChaosReigns.com


RDNS_NONE

2012-08-15 Thread Matt
I have messages marked as such:

RDNS_NONE Delivered to internal network by a host with no rDNS

Problem is they very clearly have reverse and matching forward DNS
that Exim even agrees on.  Why is SA tagging them as such?


Re: Received header syntax

2012-08-15 Thread Ori Bani
On Tue, Aug 14, 2012 at 8:19 PM, David F. Skoll  wrote:
> On Tue, 14 Aug 2012 20:01:13 -0700
> Ori Bani  wrote:
>
>> There are a few changes we want to make to our outgoing email headers,
>> including to the Received headers that our MTA adds. I know that some
>> tools including SA have some tests that judge spamminess based on
>> malformed Received headers, but I have not been able to find anywhere
>> that describes a definitive valid syntax for that header.
>
> RFC 5321, section 4.4 has a BNF description of a Received: header.
>
> http://tools.ietf.org/html/rfc5321#section-4.4

Thank you, although I wonder where the definition of "Protocol" and
"Domain" and some other things are, it's easy enough to guess.

I tried to intentionally make a terribly wrong Received to see if SA
would give me a rule hit but it did not. Is there a rule for this? If
so, how can I turn it on and off?

Is there a place I can test only this rule?


Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread Ben Johnson
On 8/15/2012 2:24 PM, John Hardin wrote:
> On Wed, 15 Aug 2012, Ben Johnson wrote:
> 
>> Some 99% of the spam that I receive, which is grossly spammy (we're
>> talking auto loans, cash advances, dink pills, the whole lot) contains
>> "BAYES_00=-1.9" in the tests portion of the X-Spam-Status header.
>>
>> Might anyone know why?
> 
> Poor training.

John, I can't thank you enough for the thoroughness of your response.

> Apart from the Bayes score, what kind of scores are those spams getting?

Here are a few examples (the first two of which are two of VERY few in
which the BAYES_* value is over 00):

-
No, score=0.192 tag=-999 tag2=3 kill=13 tests=[BAYES_20=-0.001,
HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, RDNS_NONE=0.793,
SPF_PASS=-0.001, URIBL_DBL_SPAM=1.7] autolearn=no

No, score=2.241 tag=-999 tag2=3 kill=13 tests=[BAYES_20=-0.001,
HTML_MESSAGE=0.001, RCVD_IN_BRBL_LASTEXT=1.449, RDNS_NONE=0.793,
SPF_PASS=-0.001] autolearn=no

No, score=-0.836 tag=-999 tag2=3 kill=13 tests=[BAYES_00=-1.9,
HTML_MESSAGE=0.001, RCVD_IN_BRBL_LASTEXT=1.449, RCVD_IN_DNSWL_MED=-2.3,
RDNS_NONE=0.793, SPF_PASS=-0.001, URI_HEX=1.122] autolearn=no

No, score=1.256 tag=-999 tag2=3 kill=13 tests=[BAYES_00=-1.9,
HTML_MESSAGE=0.001, RCVD_IN_BRBL_LASTEXT=1.449, RCVD_IN_DNSWL_MED=-2.3,
RDNS_NONE=0.793, SPF_PASS=-0.001, URIBL_DBL_SPAM=1.7,
URIBL_RHS_DOB=1.514] autolearn=no
-

It bears mention that the RCVD_IN_DNSWL_MED test is having even more of
a negative impact (pardon the pun) than BAYES_*. I am already working
with the dnswl.org folks (off-list, for privacy reasons) to get to the
bottom of that issue.

>> While I have not trained the Bayesian filter manually to date,
> 
> Is there any provision for any manual training in your environment? Have
> you set up training folders where your users can submit message for
> training? Do you run sa-learn at all?

No, there is no provision. No, I have not set-up training folders, and
no, I have no run sa-learn manually at all.

Most of the list is probably laughing, but given the complexity of Spam
Assassin, this crucial requirement was lost on me, amidst the sea of
information and instructions. For example, there is no mention of the
fact that SA is essentially useless without Bayesian training on
http://wiki.apache.org/spamassassin/StartUsing .

>> how is it that the spammiest of the spam is being classified with
>> BAYES_00 (thereby receiving the score -1.9)? Doesn't BAYES_00 imply
>> that the message is almost certainly not spam?
> 
> BAYES_00 implies that the message in question looks very similar to
> messages the Bayes system has been told are not spam. It depends solely
> on how it has been trained.
> 
> I wasn't aware that autolearning could do a cold-start of Bayes, can
> anyone confirm whether this is the case?
> 
> If it can't then someone somewhere trained bayes up to the default
> minimum 200 hams and 200 spams needed for it to start classifying.
> 
> Before we offer suggestions, some more data from you please:
> 
> What version of SA is this?

# spamassassin --version
SpamAssassin version 3.3.1
  running on Perl version 5.10.1

> What does "sa-learn --dump magic" report about your current Bayes database?

# sa-learn --dump magic
ERROR: Bayes dump returned an error, please re-run with -D for more
information

# su amavis -c 'sa-learn --dump magic'

# su amavis -c 'sa-learn --dump magic'
0.000  0  3  0  non-token data: bayes db version
0.000  0  11499  0  non-token data: nspam
0.000  0  39412  0  non-token data: nham
0.000  0 197769  0  non-token data: ntokens
0.000  0 1344331893  0  non-token data: oldest atime
0.000  0 1345056746  0  non-token data: newest atime
0.000  0 1345053771  0  non-token data: last journal
sync atime
0.000  0 1345023550  0  non-token data: last expiry atime
0.000  0 345600  0  non-token data: last expire
atime delta
0.000  0   6482  0  non-token data: last expire
reduction count

> What are all of the bayes_* configuration options in your local config?

None are defined there. There are a few defaults/examples, but they are
commented-out.

> 
> What will probably end up happening is this:
> (1) wipe your Bayes database
> (2) turn off autolearn
> (3) collect several hundred hams and spams for an initial training corpus
> (4) train using that corpus
> (5) evaluate results
> 
> Depending on your mail volume, once Bayes is working well after manual
> training, you may then want to reenable autolearn; I personally suggest
> it only where the volume is high enough and/or the character of mail is
> varied enough to prohibit manual training. You might also want to adjust
> the autolearn thresholds.

That makes sense; thank you for the suggestion.

> You may also want to set up some mechanism for users to submit
> misclassified messages for trainin

Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread Jeff Mincy
   From: Ben Johnson 
   Date: Wed, 15 Aug 2012 13:36:08 -0400
   
   Some 99% of the spam that I receive, which is grossly spammy (we're
   talking auto loans, cash advances, dink pills, the whole lot) contains
   "BAYES_00=-1.9" in the tests portion of the X-Spam-Status header.
   
   Might anyone know why? This is a stock installation (Ubuntu package on
   10.04).
   
Most likely you've let autolearn learn a large number of spam messages
as ham.  Any autolearn mistakes need to be corrected.

One or two spam messages with BAYES_00 is not a problem, but a large
number of them indicates a serious problem with learning.   If you
have the old spam messages then you can retrain correctly.  Otherwise
it would probably be best to start over by deleting the bayes database.

   local.cf contains
   
   #   Bayesian classifier auto-learning (default: 1)
   #
   # bayes_auto_learn 1
   
   and I have not overridden the default elsewhere. So, presumably,
   auto-learning is enabled (if that's event relevant).
   
   While I have not trained the Bayesian filter manually to date, how is it
   that the spammiest of the spam is being classified with BAYES_00
   (thereby receiving the score -1.9)? Doesn't BAYES_00 imply that the
   message is almost certainly not spam?

Yes, BAYES_00 says the spam probability is between 0 and 1%.

   http://forums.eukhost.com/f38/problems-spamassassin-bayes-filter-16948/
   
   Outside of the above forum post, search query results for this issue are
   scant.

There have been numerous posts on BAYES.

-jeff


Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread John Hardin

On Wed, 15 Aug 2012, Jari Fredriksson wrote:


15.08.2012 20:36, Ben Johnson kirjoitti:

While I have not trained the Bayesian filter manually to date, how is it
that the spammiest of the spam is being classified with BAYES_00
(thereby receiving the score -1.9)? Doesn't BAYES_00 imply that the
message is almost certainly not spam?


How could the Bayes classifier know that it is spammy, if no one make it
learn what spam looks like?

Start training it now.


It he's getting BAYES_00 hits _something_ has trained it.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Judicial Activism (n): interpreting the Constitution to grant the
  government powers that are popularly felt to be "needed" but that
  are not explicitly provided for therein (common definition);
  interpreting the Constitution as it is written (Brady definition)
---
 Today: the 67th anniversary of the end of World War II


Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread John Hardin

On Wed, 15 Aug 2012, Ben Johnson wrote:


Some 99% of the spam that I receive, which is grossly spammy (we're
talking auto loans, cash advances, dink pills, the whole lot) contains
"BAYES_00=-1.9" in the tests portion of the X-Spam-Status header.

Might anyone know why?


Poor training.

Apart from the Bayes score, what kind of scores are those spams 
getting?



While I have not trained the Bayesian filter manually to date,


Is there any provision for any manual training in your environment? Have 
you set up training folders where your users can submit message for 
training? Do you run sa-learn at all?


how is it that the spammiest of the spam is being classified with 
BAYES_00 (thereby receiving the score -1.9)? Doesn't BAYES_00 imply that 
the message is almost certainly not spam?


BAYES_00 implies that the message in question looks very similar to 
messages the Bayes system has been told are not spam. It depends solely on 
how it has been trained.


I wasn't aware that autolearning could do a cold-start of Bayes, can 
anyone confirm whether this is the case?


If it can't then someone somewhere trained bayes up to the default minimum 
200 hams and 200 spams needed for it to start classifying.


Before we offer suggestions, some more data from you please:

What version of SA is this?

What does "sa-learn --dump magic" report about your current Bayes 
database?


What are all of the bayes_* configuration options in your local config?


What will probably end up happening is this:
(1) wipe your Bayes database
(2) turn off autolearn
(3) collect several hundred hams and spams for an initial training corpus
(4) train using that corpus
(5) evaluate results

Depending on your mail volume, once Bayes is working well after manual 
training, you may then want to reenable autolearn; I personally suggest it 
only where the volume is high enough and/or the character of mail is 
varied enough to prohibit manual training. You might also want to adjust 
the autolearn thresholds.


You may also want to set up some mechanism for users to submit 
misclassified messages for training. Depending on how much you trust their 
judgement the learning from these can be automatic or can go through you 
as a reviewer.


Recommendation: keep your manual training corpus around in case you need 
to do the above again for some reason.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Judicial Activism (n): interpreting the Constitution to grant the
  government powers that are popularly felt to be "needed" but that
  are not explicitly provided for therein (common definition);
  interpreting the Constitution as it is written (Brady definition)
---
 Today: the 67th anniversary of the end of World War II


Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread Jari Fredriksson
15.08.2012 20:36, Ben Johnson kirjoitti:
> Hello,
>
> Some 99% of the spam that I receive, which is grossly spammy (we're
> talking auto loans, cash advances, dink pills, the whole lot) contains
> "BAYES_00=-1.9" in the tests portion of the X-Spam-Status header.
>
> Might anyone know why? This is a stock installation (Ubuntu package on
> 10.04).
>
> local.cf contains
>
> #   Bayesian classifier auto-learning (default: 1)
> #
> # bayes_auto_learn 1
>
> and I have not overridden the default elsewhere. So, presumably,
> auto-learning is enabled (if that's event relevant).
>
> While I have not trained the Bayesian filter manually to date, how is it
> that the spammiest of the spam is being classified with BAYES_00
> (thereby receiving the score -1.9)? Doesn't BAYES_00 imply that the
> message is almost certainly not spam?
How could the Bayes classifier know that it is spammy, if no one make it
learn what spam looks like?

Start training it now.

>
> Others have run into this same problem, but I see no resolution; here is
> one such example:
>
> http://forums.eukhost.com/f38/problems-spamassassin-bayes-filter-16948/
>
> Outside of the above forum post, search query results for this issue are
> scant.
>
> Thanks for any help,
>
> -Ben
>


-- 

"Never thought the space i "Program Files" would be a problem in Linux"

Husse Apr 9 2007




signature.asc
Description: OpenPGP digital signature


Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread Ben Johnson
Hello,

Some 99% of the spam that I receive, which is grossly spammy (we're
talking auto loans, cash advances, dink pills, the whole lot) contains
"BAYES_00=-1.9" in the tests portion of the X-Spam-Status header.

Might anyone know why? This is a stock installation (Ubuntu package on
10.04).

local.cf contains

#   Bayesian classifier auto-learning (default: 1)
#
# bayes_auto_learn 1

and I have not overridden the default elsewhere. So, presumably,
auto-learning is enabled (if that's event relevant).

While I have not trained the Bayesian filter manually to date, how is it
that the spammiest of the spam is being classified with BAYES_00
(thereby receiving the score -1.9)? Doesn't BAYES_00 imply that the
message is almost certainly not spam?

Others have run into this same problem, but I see no resolution; here is
one such example:

http://forums.eukhost.com/f38/problems-spamassassin-bayes-filter-16948/

Outside of the above forum post, search query results for this issue are
scant.

Thanks for any help,

-Ben


Re: Bogus authorize.net statements

2012-08-15 Thread Kevin A. McGrail

On 8/15/2012 12:57 PM, dar...@chaosreigns.com wrote:

On 08/15, Jim Schueler wrote:

the attached. �All share a common marker of embedding a text url within an
HTML  tag containing a different URL. �This seems like an obvious
marker for spam, I wonder why there isn't a rule for it.

There is a rule.  It hits 10x as much non-spam as spam:

ruleqa.spamassassin.org/?rule=%2Fspoofed_url

There was some work on improving it:
http://osdir.com/ml/users-spamassassin/2011-10/msg00237.html

It didn't work out:
http://osdir.com/ml/users-spamassassin/2011-10/msg00304.html

Feel free to try to do better.

Thanks for finding this.  I also have some analysis somewhere on my 
corpus though I doubt it would be different excepting that your corpus 
likely doesn't include emails with images so it's a bit skewed the other 
direction as that likely blocks the advertising tracker companies.


Regards,
KAM


Re: SpamAssassin Hanging on RTF Attachments

2012-08-15 Thread Kevin A. McGrail

  
  
On 8/15/2012 11:24 AM, Henrik K wrote:


  On Wed, Aug 15, 2012 at 11:14:58AM -0400, Kevin A. McGrail wrote:

  

Henrik, why don't you think the timeout hit?

  
  
Probably because regexps hanging and it's impossible to timeout them.


Interesting. OK. I look forwarding to see if your patch helped!

-- 
  Kevin A. McGrail
  President
  
Peregrine Computer Consultants Corporation
3927 Old Lee Highway, Suite 102-C
Fairfax, VA 22030-2422
  
http://www.pccc.com/
  
703-359-9700 x50 / 800-823-8402 (Toll-Free)
703-359-8451 (fax)
kmcgr...@pccc.com
  
  
  

  



Re: Bogus authorize.net statements

2012-08-15 Thread darxus
On 08/15, Jim Schueler wrote:
>the attached. �All share a common marker of embedding a text url within an
>HTML  tag containing a different URL. �This seems like an obvious
>marker for spam, I wonder why there isn't a rule for it.

There is a rule.  It hits 10x as much non-spam as spam:

ruleqa.spamassassin.org/?rule=%2Fspoofed_url

There was some work on improving it:
http://osdir.com/ml/users-spamassassin/2011-10/msg00237.html

It didn't work out:
http://osdir.com/ml/users-spamassassin/2011-10/msg00304.html

Feel free to try to do better.

-- 
"Just because you're offended, doesn't mean you're right." - Ricky Gervais
http://www.ChaosReigns.com


Re: Bogus authorize.net statements

2012-08-15 Thread David F. Skoll
Somewhat OT, but I'm getting SPF "fail" on all the bogus authorize.net
spams I've seen.  That should be enough to whack 'em.

Regards,

David.


Re: Bogus authorize.net statements

2012-08-15 Thread Kevin A. McGrail


Okay, let me modify my suggestion, then: if you can detect where the 
displayed text for a link is a URL, and the domain name in that URL 
does not match the domain name in the href, then it might be useful.


Does that seem more possible?


Nope.  Just look at millions of things sent by constantcontact.com where 
they add their tracking links to the newsletter content.


Sorry to be negative but I really don't think you are going to find this 
to be an indication of spam or ham.


Re: Bogus authorize.net statements

2012-08-15 Thread Axb

On 08/15/2012 06:09 PM, John Hardin wrote:

On Wed, 15 Aug 2012, Kevin A. McGrail wrote:


On 8/15/2012 11:35 AM, John Hardin wrote:

 On Wed, 15 Aug 2012, Jim Schueler wrote:

>  Is there such a rule?

 No, not at present.

>  Can I write one (I consider myself a bit of a Perl wonk)?

 Sure. Post it here and one of the rule committers can add it to their
 sandbox for testing against the masscheck corpora.

 The problem with what you suggest is that having a different
description
 in the displayed text for a link is extremely common.

 If you can manage to write a regex that detects a link tag where the
 displayed text differs from the href _AND_ the displayed text is a URL,
 then it might be useful. Just triggering on displayed text != href
is not
 useful.


I am 99.9% sure I've personally done research on this and it was no
indication of SPAM or HAM.  It is equally used in both and anecdotal
checks yesterday confirmed it.

IMO, this is a waste of time you can confirm simply by checking a
couple of legit email newsletters, for example.


Okay, let me modify my suggestion, then: if you can detect where the
displayed text for a link is a URL, and the domain name in that URL does
not match the domain name in the href, then it might be useful.

Does that seem more possible?


Wouldn't URIDetail do this?





Re: Bogus authorize.net statements

2012-08-15 Thread John Hardin

On Wed, 15 Aug 2012, Kevin A. McGrail wrote:


On 8/15/2012 11:35 AM, John Hardin wrote:

 On Wed, 15 Aug 2012, Jim Schueler wrote:

>  Is there such a rule?

 No, not at present.

>  Can I write one (I consider myself a bit of a Perl wonk)?

 Sure. Post it here and one of the rule committers can add it to their
 sandbox for testing against the masscheck corpora.

 The problem with what you suggest is that having a different description
 in the displayed text for a link is extremely common.

 If you can manage to write a regex that detects a link tag where the
 displayed text differs from the href _AND_ the displayed text is a URL,
 then it might be useful. Just triggering on displayed text != href is not
 useful.


I am 99.9% sure I've personally done research on this and it was no 
indication of SPAM or HAM.  It is equally used in both and anecdotal checks 
yesterday confirmed it.


IMO, this is a waste of time you can confirm simply by checking a couple of 
legit email newsletters, for example.


Okay, let me modify my suggestion, then: if you can detect where the 
displayed text for a link is a URL, and the domain name in that URL does 
not match the domain name in the href, then it might be useful.


Does that seem more possible?

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Riff: Torg, you traded our magic beans for a _cow_?
  Torg: It's a _magic_ cow! It's full of steaks!
  Riff: Whoa!-- Sluggy 04/28/2002
---
 Today: the 67th anniversary of the end of World War II


Re: Bogus authorize.net statements

2012-08-15 Thread Axb

On 08/15/2012 06:01 PM, Kevin A. McGrail wrote:

On 8/15/2012 11:35 AM, John Hardin wrote:

On Wed, 15 Aug 2012, Jim Schueler wrote:


Is there such a rule?


No, not at present.


Can I write one (I consider myself a bit of a Perl wonk)?


Sure. Post it here and one of the rule committers can add it to their
sandbox for testing against the masscheck corpora.


test rule on its way



Re: Bogus authorize.net statements

2012-08-15 Thread Kevin A. McGrail

On 8/15/2012 11:35 AM, John Hardin wrote:

On Wed, 15 Aug 2012, Jim Schueler wrote:


Is there such a rule?


No, not at present.


Can I write one (I consider myself a bit of a Perl wonk)?


Sure. Post it here and one of the rule committers can add it to their 
sandbox for testing against the masscheck corpora.


The problem with what you suggest is that having a different 
description in the displayed text for a link is extremely common.


If you can manage to write a regex that detects a link tag where the 
displayed text differs from the href _AND_ the displayed text is a 
URL, then it might be useful. Just triggering on displayed text != 
href is not useful.


I am 99.9% sure I've personally done research on this and it was no 
indication of SPAM or HAM.  It is equally used in both and anecdotal 
checks yesterday confirmed it.


IMO, this is a waste of time you can confirm simply by checking a couple 
of legit email newsletters, for example.


Regards,
KAM


Re: Bogus authorize.net statements

2012-08-15 Thread John Hardin

On Wed, 15 Aug 2012, Jim Schueler wrote:


Is there such a rule?


No, not at present.


Can I write one (I consider myself a bit of a Perl wonk)?


Sure. Post it here and one of the rule committers can add it to their 
sandbox for testing against the masscheck corpora.


The problem with what you suggest is that having a different description 
in the displayed text for a link is extremely common.


If you can manage to write a regex that detects a link tag where the 
displayed text differs from the href _AND_ the displayed text is a URL, 
then it might be useful. Just triggering on displayed text != href is not 
useful.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Maxim I: Pillage, _then_ burn.
---
 Today: the 67th anniversary of the end of World War II


Re: Bogus authorize.net statements

2012-08-15 Thread Jim Schueler
Is there such a rule?  Can I write one (I consider myself a bit of a Perl 
wonk)?


I understand that there are few, if any, markers that definitively define 
spam; and that's the beauty of the SpamAssassin architecture.


 -Jim

On Wed, 15 Aug 2012, Kevin A. McGrail wrote:


On 8/15/2012 11:06 AM, Jim Schueler wrote:
  Upon Kevin's recommendation, I upgraded.  Big difference.
   'Though there's a bit of a retuning penalty.

Woohoo, I was right!  All I did was flip a coin, though ;-)
  I get quite a few authorize.net notifications on behalf of
  various ecommerce clients, and this morning I started seeing
  scam/spam similar to the attached.  All share a common marker of
  embedding a text url within an HTML  tag containing a
  different URL.  This seems like an obvious marker for spam, I
  wonder why there isn't a rule for it.

There are many patterns that show up in spam that unfortunately show up in
ham as well.  If my memory serves me correctly, this just is indicative of
spam or ham.

HOWEVER, some mail systems with good glue like MIMEDefang can do things like
disable links that do this or redirect them to a CGI that gives the end-user
some warning, etc.

Regards,
KAM



Re: SpamAssassin Hanging on RTF Attachments

2012-08-15 Thread Henrik K
On Wed, Aug 15, 2012 at 11:14:58AM -0400, Kevin A. McGrail wrote:
> 
> Henrik, why don't you think the timeout hit?

Probably because regexps hanging and it's impossible to timeout them.



Re: SpamAssassin Hanging on RTF Attachments

2012-08-15 Thread Kevin A. McGrail

On 8/15/2012 11:11 AM, John Evans wrote:

On 2012-08-14 21:20, John Evans wrote:

On 2012-08-14 21:13, Kevin A. McGrail wrote:
Here's the output of -D -t on the file. I let it run for about 10 
minutes before giving up and killing the process.


Out of interest, can you let it run longer?  Say an hour just to see
if does finish processing?

regards,
KAM


You bet! I'll fire it up in a screen session tonight (about to go to
bed) and check on it in the morning when I get to work. That should
give it PLENTY of time to finish up if it's going to finish.

I'll be in touch tomorrow with (hopefully) more information.


The SA check finally finished. Here are the full debug logs from the 
scan. As you can see at the "23:59:21.418" mark, a timeout hits and 
forces SA to move on. It looks like that time out is set to many 
minutes, and I thought the default config for time_limit was 300 seconds? 
Exactly my concern as well.  Even if the RTF container's MIME type was 
set incorrectly, I would still expect a timeout because there has to be 
allowances for this.


Henrik, why don't you think the timeout hit?

Regards,
KAM


Re: SpamAssassin Hanging on RTF Attachments

2012-08-15 Thread John Evans

On 2012-08-14 23:34, Henrik K wrote:

On Wed, Aug 15, 2012 at 09:31:40AM +0300, Henrik K wrote:

On Tue, Aug 14, 2012 at 09:20:26PM -0700, John Evans wrote:
> On 2012-08-14 21:13, Kevin A. McGrail wrote:
> >>Here's the output of -D -t on the file. I let it run for about
> >>10 minutes before giving up and killing the process.
> >
> >Out of interest, can you let it run longer?  Say an hour just to 
see

> >if does finish processing?
> >
> >regards,
> >KAM
>
> You bet! I'll fire it up in a screen session tonight (about to go 
to
> bed) and check on it in the morning when I get to work. That 
should

> give it PLENTY of time to finish up if it's going to finish.
>
> I'll be in touch tomorrow with (hopefully) more information.

Nothing new about this problem, it's "well known". RTF incorrectly 
marked as

text/plain will be scanned as body and regexps go wild from all the
formatting.

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6582
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6584


Correction, anything marked text/* will be scanned.. anyways, if you 
want to

fix this, you can try applying my patch which is found there.


Thanks for the patch! I'll apply it to my system and see if this 
resolves the problems that I'm seeing.




Re: SpamAssassin Hanging on RTF Attachments

2012-08-15 Thread John Evans

On 2012-08-14 21:20, John Evans wrote:

On 2012-08-14 21:13, Kevin A. McGrail wrote:
Here's the output of -D -t on the file. I let it run for about 10 
minutes before giving up and killing the process.


Out of interest, can you let it run longer?  Say an hour just to see
if does finish processing?

regards,
KAM


You bet! I'll fire it up in a screen session tonight (about to go to
bed) and check on it in the morning when I get to work. That should
give it PLENTY of time to finish up if it's going to finish.

I'll be in touch tomorrow with (hopefully) more information.


The SA check finally finished. Here are the full debug logs from the 
scan. As you can see at the "23:59:21.418" mark, a timeout hits and 
forces SA to move on. It looks like that time out is set to many 
minutes, and I thought the default config for time_limit was 300 
seconds?


Aug 14 22:21:29.635 [6233] dbg: rules: run_generic_tests - compiling 
eval code: meta, priority -900

Aug 14 22:21:29.635 [6233] dbg: rules: compiled meta tests
Aug 14 22:21:29.635 [6233] dbg: check: running tests for priority: -400
Aug 14 22:21:29.635 [6233] dbg: rules: running head tests; score so 
far=-0.0001
Aug 14 22:21:29.635 [6233] dbg: rules: flush_evalstr 
(run_generic_tests) compiling 280 chars of 
Mail::SpamAssassin::Plugin::Check::_head_tests_neg400_1
Aug 14 22:21:29.636 [6233] dbg: rules: run_generic_tests - compiling 
eval code: head, priority -400

Aug 14 22:21:29.636 [6233] dbg: rules: compiled head tests
Aug 14 22:21:29.636 [6233] dbg: rules: running body tests; score so 
far=-0.0001
Aug 14 22:21:29.636 [6233] dbg: rules: flush_evalstr 
(run_generic_tests) compiling 223 chars of 
Mail::SpamAssassin::Plugin::Check::_body_tests_neg400_1
Aug 14 22:21:29.636 [6233] dbg: rules: run_generic_tests - compiling 
eval code: body, priority -400

Aug 14 22:21:29.636 [6233] dbg: rules: compiled body tests
Aug 14 22:21:29.636 [6233] dbg: rules: running uri tests; score so 
far=-0.0001
Aug 14 22:21:29.636 [6233] dbg: rules: flush_evalstr 
(run_generic_tests) compiling 221 chars of 
Mail::SpamAssassin::Plugin::Check::_uri_tests_neg400_1
Aug 14 22:21:29.637 [6233] dbg: rules: run_generic_tests - compiling 
eval code: uri, priority -400

Aug 14 22:21:29.637 [6233] dbg: rules: compiled uri tests
Aug 14 22:21:29.637 [6233] dbg: rules: running body_eval tests; score 
so far=-0.0001
Aug 14 22:21:29.637 [6233] dbg: rules: run_eval_tests - compiling eval 
code: 11, priority -400
Aug 14 22:21:29.637 [6233] dbg: rules: running rawbody tests; score so 
far=-0.0001
Aug 14 22:21:29.637 [6233] dbg: rules: flush_evalstr 
(run_generic_tests) compiling 229 chars of 
Mail::SpamAssassin::Plugin::Check::_rawbody_tests_neg400_1
Aug 14 22:21:29.638 [6233] dbg: rules: run_generic_tests - compiling 
eval code: rawbody, priority -400

Aug 14 22:21:29.638 [6233] dbg: rules: compiled rawbody tests
Aug 14 22:21:29.638 [6233] dbg: rules: running full tests; score so 
far=-0.0001
Aug 14 22:21:29.638 [6233] dbg: rules: flush_evalstr 
(run_generic_tests) compiling 258 chars of 
Mail::SpamAssassin::Plugin::Check::_full_tests_neg400_1
Aug 14 22:21:29.638 [6233] dbg: rules: run_generic_tests - compiling 
eval code: full, priority -400

Aug 14 22:21:29.638 [6233] dbg: rules: compiled full tests
Aug 14 22:21:29.638 [6233] dbg: rules: running meta tests; score so 
far=-0.0001
Aug 14 22:21:29.638 [6233] dbg: rules: flush_evalstr 
(run_generic_tests) compiling 283 chars of 
Mail::SpamAssassin::Plugin::Check::_meta_tests_neg400_1
Aug 14 22:21:29.639 [6233] dbg: rules: run_generic_tests - compiling 
eval code: meta, priority -400

Aug 14 22:21:29.639 [6233] dbg: rules: compiled meta tests
Aug 14 22:21:29.639 [6233] dbg: check: running tests for priority: 0
Aug 14 22:21:29.639 [6233] dbg: rules: running head tests; score so 
far=-0.0001
Aug 14 22:21:29.657 [6233] dbg: rules: flush_evalstr (add_evalstr) 
compiling 60279 chars of 
Mail::SpamAssassin::Plugin::Check::_head_tests_0_1
Aug 14 22:21:29.666 [6233] dbg: rules: flush_evalstr (add_evalstr) 
compiling 60481 chars of 
Mail::SpamAssassin::Plugin::Check::_head_tests_0_2
Aug 14 22:21:29.674 [6233] dbg: rules: flush_evalstr (add_evalstr) 
compiling 60404 chars of 
Mail::SpamAssassin::Plugin::Check::_head_tests_0_3
Aug 14 22:21:29.681 [6233] dbg: rules: flush_evalstr (add_evalstr) 
compiling 60257 chars of 
Mail::SpamAssassin::Plugin::Check::_head_tests_0_4
Aug 14 22:21:29.689 [6233] dbg: rules: flush_evalstr (add_evalstr) 
compiling 60504 chars of 
Mail::SpamAssassin::Plugin::Check::_head_tests_0_5
Aug 14 22:21:29.698 [6233] dbg: rules: flush_evalstr 
(run_generic_tests) compiling 58923 chars of 
Mail::SpamAssassin::Plugin::Check::_head_tests_0_6
Aug 14 22:21:29.703 [6233] dbg: rules: run_generic_tests - compiling 
eval code: head, priority 0

Aug 14 22:21:29.703 [6233] dbg: rules: compiled head tests
Aug 14 22:21:29.704 [6233] dbg: rules: ran header rule 
__LAST_EXTERNAL_RELAY_NO_AUTH ==> got hit: "[ ip=65.55.116.91 
rdns=blu0-omc3-s16.blu0.hotmail.

Re: Bogus authorize.net statements

2012-08-15 Thread Kevin A. McGrail

On 8/15/2012 11:06 AM, Jim Schueler wrote:
Upon Kevin's recommendation, I upgraded.  Big difference.  'Though 
there's a bit of a retuning penalty.

Woohoo, I was right!  All I did was flip a coin, though ;-)
I get quite a few authorize.net  notifications 
on behalf of various ecommerce clients, and this morning I started 
seeing scam/spam similar to the attached.  All share a common marker 
of embedding a text url within an HTML  tag containing a different 
URL.  This seems like an obvious marker for spam, I wonder why there 
isn't a rule for it.


There are many patterns that show up in spam that unfortunately show up 
in ham as well.  If my memory serves me correctly, this just is 
indicative of spam or ham.


HOWEVER, some mail systems with good glue like MIMEDefang can do things 
like disable links that do this or redirect them to a CGI that gives the 
end-user some warning, etc.


Regards,
KAM


Bogus authorize.net statements

2012-08-15 Thread Jim Schueler
Upon Kevin's recommendation, I upgraded.  Big difference.  'Though there's
a bit of a retuning penalty.

I get quite a few authorize.net notifications on behalf of various
ecommerce clients, and this morning I started seeing scam/spam similar to
the attached.  All share a common marker of embedding a text url within an
HTML  tag containing a different URL.  This seems like an obvious marker
for spam, I wonder why there isn't a rule for it.

Maybe this question is beyond the scope of a mail administrator.  But I'm
interested in the SpamAssassin internals as well.

Thanks!

 -Jim


spamtoday.msg.gz
Description: GNU Zip compressed data