Re: Bayes always reject.

2023-12-13 Thread Jeff Mincy
 > From: Pierluigi Frullani 
 > Date: Wed, 13 Dec 2023 07:49:24 +0100
 > 
 > Hello all,
 >  I'm facing a strange problem.

...
 > tests=BAYES_95,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS,T_SCC_BODY_TEXT_LINE

How did you feed this message into SpamAssassin?
Did you do something to strip off all of the email headers?

For the BAYES_99, as already mentioned you probably need to retrain
bayes, making sure to correct any incorrectly trained email messages.

-jeff


Re: BAYES scores

2023-02-28 Thread Jeff Mincy
 > From: joe a 
 > Date: Tue, 28 Feb 2023 11:37:34 -0500
 > 
 > Curious as to why these scores, apparently "stock" are what they are. 
 > I'd expect BAYES_999 BODY to count more than BAYES_99 BODY.
 > 
 > Noted in a header this morning:
 > 
 > *  3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
 > *  [score: 1.]
 > *  0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
 > *  [score: 1.]
 > 
 > Was this discussed recently?  I added a local score to mollify my sense 
 > of propriety.

Those two rules overlap.   A message with bayes >= 99.9% hits both
rules.   BAYES_99 ends at 1.00 not .999.
-jeff



Re: Hits on item with " No description available"

2022-01-20 Thread Jeff Mincy
Greg Troxel writes:
 > From: Greg Troxel 
 > Date: Thu, 20 Jan 2022 16:32:53 -0500
 > 
 > I followed my own advice about egrep -R and found this immediately
 > 
 > it's in
 > 
 > 3.004006/updates_spamassassin_org/72_active.cf
 > 
 > and it is
 > 
 > ##{ FSL_HELO_NON_FQDN_1
 > header  FSL_HELO_NON_FQDN_1 X-Spam-Relays-External =~ /^[^\]]+ 
 > helo=[a-zA-Z0-9-_]+ /i
 > ##} FSL_HELO_NON_FQDN_1
 > 
 > with score
 > 
 > score FSL_HELO_NON_FQDN_1 2.361 0.001 1.783 0.001

BTW: You can create tags (using Exuberant ctags) for spamassassin rules:

I create the tags using:

ctags -f SPAMASSASSIN_TAGS --langdef=CF --langmap=CF:.cf --languages=CF 
--regex-CF='/^[ 
\t]*(header|mimeheader|describe|body|rawbody|full|meta|uri|urirhssub|uridnsbl|urirhsbl|tflags|score|replace_rules)[
 \t]+([^ \t]+)/\2/'   ~/.spamassassin  /var/lib/spamassassin 
/usr/share/spamassassin 

So, I can do Meta-. in Emacs and it goes directly to the 'header  
FSL_HELO_NON_FQDN_1' definition

-jeff


Re: DCC whitelisting

2015-06-11 Thread Jeff Mincy
   From: sha...@shanew.net
   Date: Thu, 11 Jun 2015 10:02:59 -0500 (CDT)
   
   On Wed, 10 Jun 2015, John Hardin wrote:
   
   > On Wed, 10 Jun 2015, Shane Williams wrote:
   >
   >>  Two examples that I know are legitimate senders, but get caught by DCC
   >>  (and pyzor in some cases) and other rules that push them over the
   >>  threshold are the SourceForge.net Project of the Month list and
   >>  various Netflix emails to customers (New Arrivals or "we just added a
   >>  show you might like").  In both those cases, the user part of the
   >>  env_from changes, and as I understand it, the DCC Whitelist doesn't
   >>  allow wildcards, so I can't have an entry that matches the server
   >>  part.  Maybe I could be using the "substitute List-ID:" syntax, but
   >>  neither of those has List-ID as a specific header.
   >
   > Can you reliably identify those at the MTA level and tell the SA glue to 
skip them entirely?
   
   I probably could, but that also seems kludgy.  DCC has a whitelisting
   capability, so why not use it?
   
   Am I misunderstading what DCC's whitelist is intended for?
   
There are numerous ways to whitelist messages in DCC
The easiest is to whitelist by mail_host, eg
  ok substitute mail_host ecerts.americanexpress.com
you put the entries in /var/dcc/whiteclnt (or wherever you have the files 
installed).

The mail_host is the stuff after the @ in the return-path header.

You can test the entry by calling dccproc with the full email
message, eg:
/usr/local/bin/dccproc -d -H -Q -S mail_host -S Sender -S List-ID -S From -l 
~/.dcc -w /var/dcc/whiteclnt -R <  put_your_email_message_filename_here

You may need to change dcc_conf to make sure that mail_host is
included at startup
  DCCIFD_ARGS="-SHELO -Smail_host -SSender -SList-ID -SFrom"


You can also look at the proof of concept dcc scripts on 
http://www.rhyolite.com/dcc/

  CGI Demonstration
There is a demonstration of the proof of concept CGI scripts that
allow users to maintain individual whitelists and monitor individual
logs of rejected mail at http://www.rhyolite.com/dcc-demo-cgi-bin/
or http://cgi-demo:cgi-d...@www.rhyolite.com/dcc-demo-cgi-bin/. It
requires a user name of cgi-demo and a password of cgi-demo the same
as the user name.

-jeff


Re: effectiveness of DCC checks?

2015-04-14 Thread Jeff Mincy
   From: Quanah Gibson-Mount 
   Date: Tue, 14 Apr 2015 10:59:28 -0700
   
   I've noticed that DCC_CHECK is flagging on tons of items that are clearly 
   not spam.  The most recent hit for me today was a release announcement from 
   the mariadb folks.  Overall, it's a trend I'm routinely seeing where it is 
   flagging a lot of email that clearly isn't spam.  Are others who use DCC 
   seeing similar issues?
   --Quanah

You need to whitelist bulk senders in DCC.   See the DCC manpage:

dcc(8) - Ubuntu Manpage
  Whitelists are the responsibility of DCC clients, since only they know
  which bulk mail they solicited. The only false positives (mail marked
  as "bulk" by a DCC ...

-jeff


Re: SpamRATS RBL?

2015-03-18 Thread Jeff Mincy
   From: "Kevin A. McGrail" 
   Date: Wed, 18 Mar 2015 10:21:39 -0400
   
   Anyone use this RBL or familiar with it? Pros/cons? Efficacy data? 
   regards, KAM
   
I get 5% spam hits on DYNA and 10% on NOPTR.  The SPAM list isn't that
great (< 1% spam and some false hits).

-jeff


Re: Rule to match a blacklist of email addresses.

2015-01-10 Thread Jeff Mincy
   From: Steve 
   Date: Sat, 10 Jan 2015 14:23:36 +
   
   
   I have a domain for which (for historic reasons) I want a catch-all rule 
   to accept email. Until recently, Spamassassin has done a great job of 
   separating the ham from the spam.  Recently, I've been receiving a large 
   number of spam emails which have been misclassified as ham.   These 
   annoying spam emails tend to be addressed to a relatively small number 
   of email addresses at my domain - addresses which have never been 
   used/provided, so should be a very strong indicator of spam.
   
   If I were to have a list of a few dozen email addresses of the form:
   
   bogus_us...@mydomain.com
   onlyspample...@mydomain.com
   ...
   unwantedrubb...@mydomain.com
   
   
   What is the easiest way to implement a rule that checks against such a 
   list - and ups the spam-score if matched?  Would I have to implement a 
   separate rule for each address?

use blacklist_to bogus_us...@mydomain.com ...

This will lead to hits on USER_IN_BLACKLIST_TO

-jeff


Re: Spam messages bypassing SA

2014-10-28 Thread Jeff Mincy
   From: Bob Proulx 
   Date: Mon, 27 Oct 2014 18:37:35 -0600
   
   In the first email:
   
 # The lock file ensures that only 1 spamassassin invocation happens
 # at 1 time, to keep the load down.
 #
 :0fw: spamassassin.lock
 * < 40
 | spamc -x
   
   Kevin A. McGrail wrote:
   > geoff.spamassassin140903 wrote:
   > > Kevin A. McGrail wrote:
   > > > Using procmail without MTA glue is OK for many uses.  I am wondering 
how
   > > > many spamd connections you allow and if you have checked your logs?
   > > >
   > > > I also cannot remember but the uses of a lock file seem odd for
   > > > something that can thread.  Any one know if that is a good idea to
   > > > remove?
   > >
   > > I wonder if you could explain in simple terms what the lockfile achieves
   > > in this situation? Is it even possible that it could cause messages to
   > > bypass SA?
   >
   > I don't think a lockfile achieves anything because it's a call to a 
program.
   > Procmail has some weird syntax so hopefully someone with some procmail-fu
   > can tell us if a lock on a procmail system call does anything.
   
   Well...  The comment in the example explains what the lock is
   attempting to do.  I think that comment got missed in the follow-ups.
   The lock will restrict spamassassin invocations to one at a time to
   prevent a high system load average running too many spamassassin
   processes all at once.  It will serialize spamassassin invocations to
   one at a time instead of many in parallel.
   
   Normally the MTA will receive incoming messages and will fork a
   process for each incoming connection.  If the outside world connects
   and sends 100 messages all at once then there will be 100 MTA
   processes running in parallel.  If 10,000 all at once then probably
   some MTA process limit will prevent forking that many depending upon
   your configuration.  Each of those will try to send the message
   through procmail and spamassassin in parallel too.  Running 10,000
   procmail processes in parallel probably won't be a problem since it is
   light weight.  However running perl spamassassin 100 or 1,000 times in
   parallel all at once can be quite a resource hit to a moderate system!
   
   By putting the lock in the procmail rule it prevents more than one
   perl spamassassin process from running at a time.  This keeps the
   system from being overloaded due to a spike from the outside world.  I
   want to emphasize that the outside world impacts the system and can
   have an effect of a DDoS just by overwhelming the system with external
   connections.  The MTA has limits to prevent this but while those are
   tuned for normal delivery the MTA maintainers won't know if you are
   running each message through spamasassin and causing a higher load
   because of it.  The default MTA limits are probably too high when
   considering running the message through spamassassin too.
   
   The procmail example comes from the wiki page example:
   
 http://wiki.apache.org/spamassassin/UsedViaProcmail
   
   The wiki page example is launching "spamassassin" not "spamc".  That
   is an important difference to this case.  Someone has changed that to
   spamc in the above and preserved all else including the serialization
   lock.  The spamc talks to a spamd and so the number of parallel
   processes spamd can handle depends upon the spamd configuration.  In
   the spamc use I would be inclined to remove the serialization lock.
   Let it be throttled at the spamd side of things instead.  That would
   make the most sense to me.  Then tune spamd's limits as needed.
   
   In summary I suggest removing the serialization lock from the spamc
   recipe.  Give it a try and monitor system resource utilization.  Start
   tuning at spamd.  Tune other things as needed afterward.
   
 :0fw
 | spamc -x
   
 :0e
 {
   EXITCODE=$?
 }
   
   Bob


I agree with everything you wrote but only when bayes autolearning is
turned off.  Bayes learning holds an exclusive lock to the bayes
database particularly during expiration.

If spamc does bayes autolearning and starts an expiration then other
spamc runs for that user will be locked out of bayes.  At some point
you start getting timeouts at different points in the email delivery
chain.

I have a separate sa-learn (or spamc -L) procmail recipe that has a
serialization lock.

-jeff


Re: Philosophical question on Bayes (was Re: 23_bayes_ignore_header.cf)

2014-10-14 Thread Jeff Mincy
   From: Axb 
   Date: Tue, 14 Oct 2014 23:37:36 +0200
   
   On 10/14/2014 11:08 PM, Adam Katz wrote:
   >> On Tue, 14 Oct 2014 16:10:52 +0200 Axb  wrote:
   >>> and to avoid further discussions of what header may pollute bayes or
   >>> not, I've removed all header entries which are not directly related
   >>> to AV/filter products.
   >
   > On 10/14/2014 07:17 AM, David F. Skoll wrote:
   >> I'm not sure I agree with being too clever about Bayes.  Surely by its
   >> very nature, the Bayes algorithm will itself indicate which tokens
   >> are relevant and which are not?  Isn't that the whole point of Bayes?
   >>
   >> I think being to clever about massaging the data that gets fed to
   >> Bayes may be counter-productive.  For sure, *some* massaging is in order;
   >> a token should be a semantic unit, so something like "www.example.com"
   >> should probably be one token rather than three, but beyond that I wonder
   >> if it's good or not to massage the data?
   >
   > The purpose of bayes_ignore_header is twofold:
   >
   >   1. Prevent inheriting other systems' false positives (ensure better
   >  independence)
   >   2. Prevent relying upon headers that won't exist at delivery time (e.g.
   >  added by the mailbox server)
   >
   > This is why it's so important to ignore other spam engines, which
   > basically fit into both of those categories.
   
   I'd love to have the option (switch) to use Bayes on msg bodies ONLY, 
   though I doubt anybody would be a taker for such a project.
   (I'd even be willing to "$pon$or" such an addition to SA)
   
Wouldn't that be fairly easy to implement  by intercepting the call to
_tokenize_headers in Plugin/Bayes.pm?

  # Tokenize the headers
  my %hdrs = $self->_tokenize_headers ($msg);
  while( my($prefix, $value) = each %hdrs ) {
push(@tokens, $self->_tokenize_line ($value, "H$prefix:", 0));
  }

-jeff


Re: Bayes Problem

2014-08-28 Thread Jeff Mincy
   From: Julian Brown 
   Date: Thu, 28 Aug 2014 10:46:55 -0500
   
   I work for a company that has lots of mail users.  We use Exim with
   Spamassassin.   My job is to track down this problem.
   
   We are getting complaints of too much spam and have tracked it down, using
   Google, to our bayes files not working correctly.  I do not know if they
   are poisoned or just not working.
   
   When bad spam gets through it is always the same, BAYES_00 -1.9 in the
   headers.   According to what I have googled there is only one thing we can
   do and that is to clear the bayes filters and either allow it to start
   again and possibly retrain.   Each individual has their own bayes filters,
   /home/user/.spamassassin/bayes_*.
   
   Exim version 4.82 #2 built 17-Jul-2014 13:21:53
   SpamAssassin Server version 3.3.2
   CentOS 6.5 64bit
   
   But we are getting a lot of it, not all accounts, so I think this means we
   are getting poisoned or something they are doing is rendering the bayes
   filters non functional.
   
   Here is from one of them from a week or 2 ago:
   
   sa-learn --dump magic
   0.000  0476  0  non-token data: nspam
   0.000  0  40270  0  non-token data: nham
...
   
   I don't know the significance of the above readout, but all the discussions
   talk about this.
   
   Julian

You need to learn way more spam messages.   You will get the best results
by learning from essentially all messages, as long as the messages are
learned correctly.   In addition to not having enough spam messages
you probably have learned various spam messages as ham.

-jeff


Re: New at SpamAssassin - how to not get headers

2014-08-05 Thread Jeff Mincy
   From: RobertGrimes 
   Date: Tue, 5 Aug 2014 08:50:44 -0700 (PDT)
   
   I don't know if this is fair to ask, but would you (or anyone) care to see
   if the message I am posting should be rated higher than 1.9? I appologize if
   this is not appropriate.
   
   The message is at http://pastebin.com/UZeDtLWZ
   
You need to save the complete original message.   Many of the headers are 
missing.
  MISSING_DATE=0.1,MISSING_MID=0.497,NO_RECEIVED=-0.001,NO_RELAYS=-0.25

With sufficient training you should be able to get BAYES_99 +
BAYES_999

-jeff


Re: getting tons of SPAM

2014-07-02 Thread Jeff Mincy
   From: John Hardin 
   Date: Wed, 2 Jul 2014 14:45:07 -0700 (PDT)
   
   On Wed, 2 Jul 2014, motty cruz wrote:
   
   > bayan filter is not running: according to header,
   >
   > X-Virus-Scanned: amavisd-new at fqdn.com
   > X-Spam-Flag: NO
   > X-Spam-Score: -0.009
   > X-Spam-Level:
   > X-Spam-Status: No, score=-0.009 tagged_above=-999 required=5.3
   >tests=[HTML_MESSAGE=0.001, T_RP_MATCHES_RCVD=-0.01]
   >autolearn=unavailable
   > Received: from
   >
   > # sa-learn --dump magic
   > Error Opening file /usr/local/share/GeoIP/GeoIPv6.dat
   > 0.000  0  3  0  non-token data: bayes db version
   > 0.000  0   3338  0  non-token data: nspam
   > 0.000  0784  0  non-token data: nham
   >
   > any ideas?
   
Note the "autolearn=unavailable" part.
The Bayes database is probably locked doing an expire.

Also, the GeoIP data file should be fixed:
 Error Opening file /usr/local/share/GeoIP/GeoIPv6.dat

   You need to post samples (to pastebin). We can't make comments on what 
   *should* be hitting unless we can see the message itself.

Yep.
-jeff


Re: whitelist_from_spf dbg

2014-05-19 Thread Jeff Mincy
   From: Matus UHLAR - fantomas 
   Date: Mon, 19 May 2014 15:44:30 +0200
   
   > On 17.05.14 14:11, Jeff Mincy wrote:
   > >It would have been easier to figure out why it was matching if the
   > >matching spf entry was printed out, for example something like this:
   > >
   > >May  8 18:21:27.859 [22058] dbg: spf: whitelist_from_spf: 
amandarodriq...@odysseyshop.ribsbuy.com matches ^.*\@.*buy\.com$ entry
   > >May  8 18:21:27.859 [22058] dbg: spf: whitelist_from_spf: 
amandarodriq...@odysseyshop.ribsbuy.com is in user's WHITELIST_FROM_SPF and 
passed SPF check
   
   > From: Matus UHLAR - fantomas 
   > Date: Sun, 18 May 2014 18:22:49 +0200
   >  According to the documentation, they are not regexp's (as one could/should
   >  expect):
   >
   >Whitelist and blacklist addresses are now file-glob-style patterns,
   
   On 18.05.14 13:44, Jeff Mincy wrote:
   >The matching whitelist_from_spf entry *@*buy.com is a file glob pattern
   >which matched.  I'm not sure why you are quoting the manual here.  The
   >whitelist entry *@*buy.com is turned into a regexp by add_to_addrlist
   >in SpamAssassin/Conf/Parser.pm which among other things does s/\*+/\.\*/g
   
   I wanted to point out that you (and many other people) could be surprised
   what you see in the regexp, because the glob-style pattern you enter into
   blacklist/whitelist directive.
   
   Maybe if not the RE, but the directive content was shown in the debug
   output...

Sure, printing out the original glob would be better.   The original
glob isn't currently saved - it would be a little more work.
I could come up with other ideas - such as returning the information
in a tag that could be added to a header.
   
   >   I assume the contents of *_networks is modified before RE matching, so 
you'd
   >   wonder what is the content...
   
   >Ok, you lost me.  What does the contents of *_networks have to do with
   >the suggestion to print the matching whitelist regexp entry?  Nothing
   >matching *buy.com has been added to *_networks if that is what you are
   >wondering.
   
   sorry, that had to be (black|white)list_*, not *_networks.

Ah.  Yes, the glob style whitelist was modified into a regexp before matching.
   
   -- 
   Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
   Warning: I wish NOT to receive e-mail advertising to this address.
   Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
   Remember half the people you know are below average. 
-jeff


Re: whitelist_from_spf dbg

2014-05-18 Thread Jeff Mincy
   From: Matus UHLAR - fantomas 
   Date: Sun, 18 May 2014 18:22:49 +0200
   
   On 17.05.14 14:11, Jeff Mincy wrote:
   >I just got some spam that was erroneously spf whitelisted hitting 
WHITELIST_FROM_SPF
   >It took me a while to figure out why it was getting WHITELIST_FROM_SPF
   >but I eventually tracked it down down to this whitelist entry:
   >   whitelist_from_spf *@*buy.com
   >The *@*buy.com (obviously) matches *@odysseyshop.ribsbuy.com.
   >
   >It would have been easier to figure out why it was matching if the
   >matching spf entry was printed out, for example something like this:
   >
   >May  8 18:21:27.859 [22058] dbg: spf: whitelist_from_spf: 
amandarodriq...@odysseyshop.ribsbuy.com matches ^.*\@.*buy\.com$ entry
   >May  8 18:21:27.859 [22058] dbg: spf: whitelist_from_spf: 
amandarodriq...@odysseyshop.ribsbuy.com is in user's WHITELIST_FROM_SPF and 
passed SPF check
   
   According to the documentation, they are not regexp's (as one could/should
   expect):
   
Whitelist and blacklist addresses are now file-glob-style patterns,
   
The matching whitelist_from_spf entry *@*buy.com is a file glob pattern
which matched.  I'm not sure why you are quoting the manual here.  The
whitelist entry *@*buy.com is turned into a regexp by add_to_addrlist
in SpamAssassin/Conf/Parser.pm which among other things does s/\*+/\.\*/g


   >sub _wlcheck {
   >  my ($self, $scanner, $param) = @_;
   >  if (defined ($scanner->{conf}->{$param}->{$scanner->{sender}})) {
   >return 1;
   >  } else {
   >study $scanner->{sender};
   >foreach my $regexp (values %{$scanner->{conf}->{$param}}) {
   >  if ($scanner->{sender} =~ qr/$regexp/i) {
   >##New dbg output here:
   >dbg("spf: $param:  $scanner->{sender} matches $regexp entry");
   >return 1;
   
   I assume the contents of *_networks is modified before RE matching, so you'd
   wonder what is the content...

Ok, you lost me.  What does the contents of *_networks have to do with
the suggestion to print the matching whitelist regexp entry?  Nothing
matching *buy.com has been added to *_networks if that is what you are
wondering.

-jeff


whitelist_from_spf dbg

2014-05-17 Thread Jeff Mincy


I just got some spam that was erroneously spf whitelisted hitting 
WHITELIST_FROM_SPF
It took me a while to figure out why it was getting WHITELIST_FROM_SPF
but I eventually tracked it down down to this whitelist entry:
   whitelist_from_spf *@*buy.com
The *@*buy.com (obviously) matches *@odysseyshop.ribsbuy.com.   

It would have been easier to figure out why it was matching if the
matching spf entry was printed out, for example something like this:

May  8 18:21:27.859 [22058] dbg: spf: whitelist_from_spf: 
amandarodriq...@odysseyshop.ribsbuy.com matches ^.*\@.*buy\.com$ entry
May  8 18:21:27.859 [22058] dbg: spf: whitelist_from_spf: 
amandarodriq...@odysseyshop.ribsbuy.com is in user's WHITELIST_FROM_SPF and 
passed SPF check

sub _wlcheck {
  my ($self, $scanner, $param) = @_;
  if (defined ($scanner->{conf}->{$param}->{$scanner->{sender}})) {
return 1;
  } else {
study $scanner->{sender};
foreach my $regexp (values %{$scanner->{conf}->{$param}}) {
  if ($scanner->{sender} =~ qr/$regexp/i) {
##New dbg output here:
dbg("spf: $param:  $scanner->{sender} matches $regexp entry");
return 1;
  }
}
  }
  return 0;
}

-jeff


Re: help with regex

2014-02-26 Thread Jeff Mincy
   From: "Kevin A. McGrail" 
   Date: Wed, 26 Feb 2014 19:06:34 -0500
   
   On 2/26/2014 6:53 PM, Webmaster wrote:
   > I need a regex to match an alphanumeric string with letters and numbers.
   >
   > example:  48HQZBF404TY2298D1414BB8050022YQ3872444
   >
   > The pattern is defined as:
   >
   > A sequence of alphanumeric characters, letters are upper or lower 
   > case, at least 30 chars long, containing at least 10 numbers.
   >
   > This part is easy enough:  [a-zA-Z0-9]{30,}
   >
   > But I can't figure out how to match only ifthe string contains at 
   > least 10 numbers. 
   Hmm, I think you might need a plugin for that one.

Can't you do something like this using a look ahead regexp?

(?=[A-Z0-9]{30,})(?:[A-Z]*[0-9]){10,}

The look ahead gets the 30 chars.   Then the next part gets the 10 or
more numbers.   You probably don't need unbounded {10,} but you do need
the {30,} part to be unbounded.

Is the 10 number part really important?

-jeff


Re: re-learning ? was - bayes - large message

2013-04-20 Thread Jeff Mincy
   From: "Joe Acquisto-j4" 
   Date: Sat, 20 Apr 2013 09:10:26 -0400
   
   >>> On 4/19/2013 at 8:33 PM, "Joe Acquisto-j4"  wrote:
    On 4/19/2013 at 8:26 PM, "Joe Acquisto-j4"  wrote:
   >> I thought I had corrected this issue, with someone's assistance, a while 
   > ago:
   >> 
   >> Apr 19 20:21:02.477 [23670] dbg: bayes: expiry completed
   >> Apr 19 20:21:02.477 [23670] info: archive-iterator: skipping large message
   >> Learned tokens from 0 message(s) (0 message(s) examined)
   > 
   > Please ignore.  As much as possible.   I was testing manually and forgot 
   > --mbox on the command line.
   > 
   > However, I can see something is amiss as it is happily accepting spam I 
   > thought had been previously submitted.
   > 
   > joe a.
   
   Ok, I am officially puzzled.   
   
   I setup email addresses on my SA box, to which I and others (they say) send 
ham/spam.  Then I have cron tasks that feed those emails twice daily to bayes.  
And emails the output to my admin mailbox.
   
   I can review those admin messages and see "Learned tokens from n message(s) 
(n message(s) examined)".   Yet, if i resend the bayes food from those dates, 
it appears to re-learn them.   I would expect "Learned tokens from 0 
messages(s) (n messages(s). . . "
   if it already had seen them.
   
   I have tried this for several dates and get the same result.  What could it 
be?  Not Operator Trouble, surely . . .
   
   joe a

Bayes uses the message id from the email message to remember which
messages it has seen.  If you are really emailing the messages then
you are getting a new message-id which is then learned.  You need to
train on the unadulterated original email message.  You can do this by
attaching the complete email message.  Otherwise you are training
bayes to recognize tokens added by your users during the forwarding
process as a spam indicator.

-jeff


Re: rdns in received header

2013-02-21 Thread Jeff Mincy
   From: Matus UHLAR - fantomas 
   Date: Thu, 21 Feb 2013 16:36:18 +0100
   
   >On 2/21/2013 9:03 AM, Jeff Mincy wrote:
   >>Well, I trust the network not to lie.  This is more of an omission
   
   On 21.02.13 10:26, Kevin A. McGrail wrote:
   >Your Clinton-esque logic likely doesn't apply here ;-).  The land of 
   >RFC's works to avoid this type of logic in a language I call 
   >RFC-eeze.
   
   as long as I understan Jeff's original mail, the issue is that his ISP
   stopped providing DNS information in the Received: headers.
   SA does not do lookups on the IPs in Received: (there's iirc one exemption
   related to a buggy software) and if it's not there, it assumes the rDNS does
   not exist, while it does. 

Actually the ISP added a completely new hop, and that hop is not
adding rDNS to the received header.   I had to add the new hop to
trusted_networks and internal_networks.   The new hop looks like it
is scanning the messages using Cloudmark:
 X_CMAE_Category: ...
 X-CNFS-Analysis: ...
 X-CM-Score: ...
 X-Scanned-by: Cloudmark Authority Engine
   
   
   >>I could always whine to Rcn about it, maybe they'll fix it.
   
   >I think that's a good move to at least try!  It truly sounds more 
   >like a DNS error that they might know be are is occurring.
   
   if the error repeats, I assume Jeff's guess is correct and the ISP just
   turned rDNS lookups off.

Or neglected to turn on the lookups in the first place...

-jeff


Re: rdns in received header

2013-02-21 Thread Jeff Mincy
   From: "Kevin A. McGrail" 
   Date: Thu, 21 Feb 2013 11:07:20 -0500
   
   On 2/21/2013 10:36 AM, Matus UHLAR - fantomas wrote:
   > And how is this ISP's issue related to RFCs? The RFC does not mention 
   > word
   > "trusted"
   A fair point that I didn't explain clearly enough.
   
   The RFCs cover received headers for SMTP and RFCs strive to be black and 
   white.  Discussing things as gray area is an argument that Bill Clinton 
   was famous for but doesn't really hold a place in discussing technology 
   covered by

Which RFC talks about Received headers having rDNS or what information
is supposed to be in the received header?
   
   The point of SA's trusted configuration is that you "trust" the 
   headers.  In this case, he's saying he doesn't trust the headers because 
   they are omitting important information but that they aren't lying, just 
   lying by ommissions.   To me, this says "I can't trust those headers" 
   and you need to pull back your trust circle which in this case will ruin 
   much of the rules SA uses for pathway analysis (RBLs, rDNS, etc.)
   
   Fixing those headers outside SA or fixing the ISP creating those headers 
   are the real solutions.

There is of course a third option for me - I could turn off the spam
filtering on Rcn email.  Most of the spam is blocked by Rcn, there's
almost no point in trying to filter what little spam is left.


-jeff


Re: rdns in received header

2013-02-21 Thread Jeff Mincy
   From: "Kevin A. McGrail" 
   Date: Thu, 21 Feb 2013 08:46:40 -0500
   
   On 2/20/2013 8:51 PM, Jeff Mincy wrote:
   > ...
   >
   > This leads to various bad things (RDNS_NONE & broken WHITELIST_FROM_RCVD)
   >
   > Is there anything in SpamAssassin that can deal more elegantly with
   > this particular problem?  Perhaps Some sort of please_fill_in_rcvd_rdns
   > type option?

   Off the cuff, the point of trusted networks is to say you trust that 
   network's headers.  However, in this case, you don't... I don't really 
   know a fix for this because we have enough issues parsing received 
   headers, let alone re-writing them.

Well, I trust the network not to lie.  This is more of an omission

   How good is your perl and maybe you can solve it in MIMEDefang before 
   it's sent to SA?

Yea, I expected this was going to be the answer.   It would have to be
a procmail filter that calls out to a script.  Yuck.

Thanks for confirming my suspicion.

I could always whine to Rcn about it, maybe they'll fix it.

-jeff 


rdns in received header

2013-02-20 Thread Jeff Mincy


My local ISP (rcn.com) reconfigured their email servers.  The
69.168.97.77 hop does not seem to be doing rdns lookups on the
previous hop.  For example, I get these two received headers at the
trust boundary:

...
Received: from mx.rcn.com ([69.168.97.77])
  by mx06.atw.mail.rcn.net with ESMTP; 20 Feb 2013 17:07:22 -0500
...trust/internal boundary...
Received: from [216.33.63.216] ([216.33.63.216:56326] 
helo=bigfootinteractive.com)
by mx.rcn.com (envelope-from 
<1709130a2layfovcia3kqqzqabnxydzhs2jc2h4yaa...@mail.ameriprise.com>)
(ecelerity 2.2.3.49 r(42060/42061)) with ESMTP
id 29/DB-26250-A1945215; Wed, 20 Feb 2013 17:07:22 -0500
...

and the relays are parsed as

  X-Spam-Relay: 
 Trusted= ...[ ip=69.168.97.77 rdns=mx.rcn.com helo=mx.rcn.com 
by=mx06.atw.mail.rcn.net ident= envfrom= intl=1 id= auth= msa=0 ]
 Untrusted=[ ip=216.33.63.216 rdns= helo=bigfootinteractive.com 
by=mx.rcn.com ident= 
envfrom=1709130a2layfovcia3kqqzqabnxydzhs2jc2h4yaa...@mail.ameriprise.com 
intl=0 id=29/DB-26250-A1945215 auth= msa=0 ] ...


This leads to various bad things (RDNS_NONE & broken WHITELIST_FROM_RCVD)

Is there anything in SpamAssassin that can deal more elegantly with
this particular problem?  Perhaps Some sort of please_fill_in_rcvd_rdns
type option?

I'm still on 3.2.5 (yes I know it is old).

-jeff


Re: X-Relay-Countries

2013-02-12 Thread Jeff Mincy
   From: Mike Grau 
   Date: Tue, 12 Feb 2013 14:18:33 -0600
   
   > Hmm  I would do something like this (untested):
   > 
   > header RELAY_NOT_US X-Relay-Countries =~ /\b(?!US)[A-Z]{2}\b/
   
   I've had to use, IIRC.
   X-Relay-Countries =~ /\b(?!US|XX)([A-Z]{2})\b/

XX means unknown, mostly due to stale database.  You can update the
IP::Country database.  See:
   http://wiki.apache.org/spamassassin/RelayCountryPlugin

-jeff


Re: Spamassassin not parsing email messages

2012-12-28 Thread Jeff Mincy
   From: Sean Tout 
   Date: Fri, 28 Dec 2012 01:10:02 -0800 (PST)
   
   Hi Henrik,
   
   Thank you much for the prompt response and points. I ran the Perl script
   with the code you pasted below, but still got the same report scores for all
   emails! by the way, when I also tried to print contents of the emails using
   $status->get_content_preview(), I got [...] I'm unable to print any portions
   of the email messages using $status = $spamtest->check($mail), however I can
   print any portions using $folder_reader->read_next_email().
   
   Regards,
   
   Sean.
   
Based on the tests that are hit
   --
   -0.0 NO_RELAYS  Informational: message was not relayed via SMTP
1.2 MISSING_HEADERSMissing To: header
0.1 MISSING_MIDMissing Message-Id: header
1.8 MISSING_SUBJECTMissing Subject: header
2.3 EMPTY_MESSAGE  Message appears to have no textual parts and no
   Subject: text
   -0.0 NO_RECEIVEDInformational: message has no Received headers
1.4 MISSING_DATE   Missing Date: header
0.0 NO_HEADERS_MESSAGE Message appears to be missing most RFC-822

you are passing in malformed email messages into SpamAssassin.
SpamAssassin can not find any of the headers.  I'd guess that you
have extraneous junk at the beginning of each message.

-jeff


Re: BAYES_00

2012-10-06 Thread Jeff Mincy
   From: Arthur Dent 
   Date: Sat, 06 Oct 2012 11:03:18 +0100
   
   Hello all,
   
   Following a hard drive crash I am rebuilding my small home server on a
   Fedora17 platform.
   
   One of the casualties of the HD crash was my spam corpus. I had a (very
   old) backup which happened to include a previous spam corpus so I used
   that to sa-learn.
   
   All my messages hit BAYES_00. 
   
   I don't have many "fresh" spams. I do not run a SMTP server, I simply
   collect mail for my family and myself from my ISP and other sources
   using fetchmail. My ISP seem to filter most of the really bad stuff so I
   get just a trickle of spams (about 1 per day - if that) but even those
   hit BAYES_00 despite sometimes being identical to a previous FN that had
   already been learned with sa-learn.
   
   Here is my --dump magic: ...
   
   What - if anything - can I do to improve bayes performance?

Get more spam?  Bayes really isn't going to do well with limited
amount of spam.  It does great when correctly trained using lots of
spam.  But with limited data, not so much.

You could try starting over.  It will take 6 months or so to get to
200 spam messages if you are really getting about 1 per day.  You
could just turn off Bayes.  Or you could just turn Bayes off.  I'm
almost at the same point with my home email, for the same reason.

-jeff


Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread Jeff Mincy
   From: Ben Johnson 
   Date: Wed, 15 Aug 2012 13:36:08 -0400
   
   Some 99% of the spam that I receive, which is grossly spammy (we're
   talking auto loans, cash advances, dink pills, the whole lot) contains
   "BAYES_00=-1.9" in the tests portion of the X-Spam-Status header.
   
   Might anyone know why? This is a stock installation (Ubuntu package on
   10.04).
   
Most likely you've let autolearn learn a large number of spam messages
as ham.  Any autolearn mistakes need to be corrected.

One or two spam messages with BAYES_00 is not a problem, but a large
number of them indicates a serious problem with learning.   If you
have the old spam messages then you can retrain correctly.  Otherwise
it would probably be best to start over by deleting the bayes database.

   local.cf contains
   
   #   Bayesian classifier auto-learning (default: 1)
   #
   # bayes_auto_learn 1
   
   and I have not overridden the default elsewhere. So, presumably,
   auto-learning is enabled (if that's event relevant).
   
   While I have not trained the Bayesian filter manually to date, how is it
   that the spammiest of the spam is being classified with BAYES_00
   (thereby receiving the score -1.9)? Doesn't BAYES_00 imply that the
   message is almost certainly not spam?

Yes, BAYES_00 says the spam probability is between 0 and 1%.

   http://forums.eukhost.com/f38/problems-spamassassin-bayes-filter-16948/
   
   Outside of the above forum post, search query results for this issue are
   scant.

There have been numerous posts on BAYES.

-jeff


Re: USER_IN_WHITELIST and SPF_FAIL

2012-06-19 Thread Jeff Mincy
   From: RW 
   Date: Tue, 19 Jun 2012 23:43:57 +0100
   
   On Tue, 19 Jun 2012 18:02:28 -0400
   Jeff Mincy wrote:
   
   >From: John Hardin 
   >Date: Tue, 19 Jun 2012 14:44:29 -0700 (PDT)
   >
   >On Tue, 19 Jun 2012, Benny Pedersen wrote:
   >
   >> Den 2012-06-19 22:39, Kevin A. McGrail skrev:
   >>
   >>>  I think that's the concept behind the whitelist_from_spf
   >>
   >> but some use whitelist_from, its nothing new there :=)
   >>
   >> can user_in_whitelist be changed to not have -100 as default
   >> score, or is whitelist_from planned for removements ?
   >
   >It's needed for whan none of the other more-strict whitelist
   > options will work, so we can't get just rid of it.
   >
   > True.
   > 
   >I'd suggest instead a lint warning if it is used, alerting the
   > admin that it's discouraged and that it has problems like this and is
   > very easy to spoof.
   >
   > How about creating a different score for whitelist_from that is
   > separate from whitelist_from_rcvd?   For example, whitelist_from could
   > trigger USER_IN_SIMPLE_WHITELIST (or some other variation).   The
   > description of the test could include warnings about how easy
   > it is to spoof whitelist_from.
   
   If used sensibly USER_IN_WHITELIST is probably the most reliable rule we
   have, for the overwhelming majority of addresses it's far more accurate
   than spf based whitelisting. It's not always right to treat users as
   idiots.

Huh?  What you mean by used sensibly?  whitelist_from_rcvd is very
reliable.  whitelist_from is trivial to spoof.  whitelist_from_rcvd
and whitelist_from both trigger USER_IN_WHITELIST.

It is easy to get into trouble using whitelist_from - having a
separate score just for whitelist_from would make identifying the
problem easier for the user.

-jeff


Re: USER_IN_WHITELIST and SPF_FAIL

2012-06-19 Thread Jeff Mincy
   From: John Hardin 
   Date: Tue, 19 Jun 2012 14:44:29 -0700 (PDT)
   
   On Tue, 19 Jun 2012, Benny Pedersen wrote:
   
   > Den 2012-06-19 22:39, Kevin A. McGrail skrev:
   >
   >>  I think that's the concept behind the whitelist_from_spf
   >
   > but some use whitelist_from, its nothing new there :=)
   >
   > can user_in_whitelist be changed to not have -100 as default score, or is 
   > whitelist_from planned for removements ?
   
   It's needed for whan none of the other more-strict whitelist options will 
   work, so we can't get just rid of it.
   
True.

   I'd suggest instead a lint warning if it is used, alerting the admin that 
   it's discouraged and that it has problems like this and is very easy to 
   spoof.
   
How about creating a different score for whitelist_from that is
separate from whitelist_from_rcvd?   For example, whitelist_from could
trigger USER_IN_SIMPLE_WHITELIST (or some other variation).   The
description of the test could include warnings about how easy
it is to spoof whitelist_from.

-jeff


Re: Whitelisting with DKIM

2011-10-31 Thread Jeff Mincy
   From: Alex 
   Date: Mon, 31 Oct 2011 12:18:33 -0400
   I have a fedora15 system with sa-3.3.2 and amavisd-2.6.6 and would
   like to whitelist messages like these:
   
   Oct 31 11:19:42 mail02 amavis[3518]: (03518-01-20) SPAM-TAG,
->
   <50...@example.com>, No, score=-4.555 tagged_above=-100 required=5
   tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_IMAGE_RATIO_04=0.61,
   HTML_MESSAGE=0.001, KHOP_RCVD_TRUST=-1.75, LOC_SHORT=0.6,
   
   I've enabled dkim in amavisd.conf:
   
   $enable_dkim_verification = 1;  # enable DKIM signatures verification
   $enable_dkim_signing = 1;# load DKIM signing code, keys defined by 
dkim_key
   
...

   Oct 31 11:29:04.733 [7571] info: rules: meta test L_UNVERIFIED_GMAIL
   has dependency 'DKIM_VERIFIED' with a zero score
   Oct 31 11:29:04.837 [7571] dbg: check:
   
tests=DKIM_SIGNED,DKIM_VALID,HTML_IMAGE_RATIO_04,HTML_MESSAGE,KHOP_RCVD_TRUST,LOC_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,RCVD_IN_IADB_DK,RCVD_IN_IADB_LISTED,RCVD_IN_IADB_OPTIN,RCVD_IN_IADB_RDNS,RCVD_IN_IADB_SPF,RCVD_IN_UCEPROTECT2,RELAYCOUNTRY_US,RP_MATCHES_RCVD,T_REMOTE_IMAGE,URIBL_GREY
   
   Why does DKIM_VERIFIED have a zero score in 50_scores.cf?

Anybody, including spammers, can do DKIM.  You could make have it
a small negative score like -0.5 or so.
   
   I've added the following entries to local.cf, but I suspect this is
   what I'm doing wrong. I don't mean to whitelist all of constant
   contact.
   
   whitelist_from_dkim *@in.constantcontact.com
   whitelist_from_dkim *@bertolini-sales.com
   
   There is a copy of the full message here:
   
   http://pastebin.com/raw.php?i=pmyFn9f9
   
   Thanks so much for any ideas.
   Alex

I think you want 
  whitelist_from_dkim *@bertolini-sales.com  auth.ccsend.com

The auth.ccsend.com comes from the signature line
  DKIM-Signature: ... d=auth.ccsend.com

-jeff


Disposition deleted

2011-08-08 Thread Jeff Mincy


Can somebody clue me in on how to match 'Disposition: 
automatic-action/MDN-sent-automatically; deleted'
in a disposition-notification mime attachment?

   --_=_NextPart_001_01CC55E0.440F392C
   Content-Type: message/disposition-notification
   Content-Transfer-Encoding: 7bit

   Final-Recipient: RFC822; kathy.du...@ca.com
   Disposition: automatic-action/MDN-sent-automatically; deleted
   X-MSExch-Correlation-Key: 1CORJJTUYkSeBj5kXwFqLQ==

   --_=_NextPart_001_01CC55E0.440F392C--

I've tried body, rawbody and mimeheader without success:
   mimeheader LOCAL_AUTOMATIC_ACTION Disposition =~ 
/automatic-action\/MDN-sent-automatically; deleted/

This appears to be some new MS Exchange bounce message.

I'm running 3.2.5 if it matters.

thanks.  
-jeff


RE: SA and Spear Phishing

2011-03-18 Thread Jeff Mincy
   From: Hamad Ali 
   Date: Sat, 19 Mar 2011 00:46:08 +0400
   
   ## back on topic ##
   Anyway, I would highly appreciate any help on spear phishing. A solution, a 
guess, or just if you know whether you get spear phish at all is good 
information for me (I started to think that 99% of mail admins never know that 
they get spear phish because of the extremely high success rate of spear phish).
   PS: Spear Phishing is a problem that I noticed many commercial 
appliances struggle at. This thread is not meant to promote or demote SA, but 
to address a cutting-edge problem that many software classifiers fail to 
address.
   --H

Either I haven't gotten any spear phishing spam, or the spear phishing
spam is being blocked by SpamAssassin.  I'll assume the later.

If there's some particular type of email that you're having trouble with
the easiest way to get help is to post a complete sample including all
the headers using some pastebin and send the link and the x-spam-status
line that you get on your SpamAssassin to the group.

Otherwise all you're going to get vague platitudes like train bayes.

-jeff


Re: new rules - where do i activate them?

2011-03-02 Thread Jeff Mincy
   From: John Hardin 
   Date: Wed, 2 Mar 2011 07:50:38 -0800 (PST)
   
   On Wed, 2 Mar 2011, tr_ust wrote:
   
   
   > This is what my rules look like now:
   >
   > uri LOCAL_URI_EXAMPLE /zynetsw.com\/forms\/use\/index\/form1.html/
   > score LOCAL_URI_EXAMPLE 200
   > uri LOCAL_URI_EXAMPLE /zynetsw.com\/forms\/use\/nana\/form1.html/
   > score LOCAL_URI_EXAMPLE 100
   > uri LOCAL_URI_EXAMPLE /zynetsw.com\/forms\/use\/ontokoros\/form1.html/
   > score LOCAL_URI_EXAMPLE 100
   > uri LOCAL_URI_EXAMPLE /zynetsw.com\/forms\/use\/tbt\/form1.html/
   > score LOCAL_URI_EXAMPLE 200
   > uri LOCAL_URI_EXAMPLE /zynetsw.com\/forms\/use\/webadmin\/form1.html/
   > score LOCAL_URI_EXAMPLE 200
   >
   > I took out the last "/" as you suggested...thanks.
   
   You may also want to escape the periods so they are literal matches rather 
   then "match any single character":
   
  uri LOCAL_URI_EXAMPLE /zynetsw\.com\/forms\/use\/webadmin\/form1\.html/
   
   Also, you only have one rule there. Every time you put in another "uri 
   LOCAL_URI_EXAMPLE" you overwrite the previous definition. Change the name 
   of each rule, for example by appending _00 _01 _02, etc.
   
Also, the rules could be combined into a single rule (untested) using
regexp (?:index|nana|ontokoros|tbt|webadmin)

uri LOCAL_URI_EXAMPLE 
/zynetsw.com\/forms\/use\/(?:index|nana|ontokoros|tbt|webadmin)\/form1.html/


-jeff


Re: Trouble whitelisting domain users with whitelist_from_rcvd

2010-07-28 Thread Jeff Mincy
   From: keithcommins 
   Date: Wed, 28 Jul 2010 07:57:43 -0700 (PDT)
   
   Hi there , 
   
   Having some trouble getting this to work correctly , it would seem..
   
   Firstly,  here is my whitelist_from rcvd config from my local.cf file.
   
You can't use whitelist_from_rcvd on internal email.   You don't have
an external relay to match against.   It doesn't matter if your
machine ends in .local or not.

Note the FH_DATE_PAST_20XX.   You probably need to run sa-update sometime this 
year.

The ALL_TRUSTED should be enough by itself.   If you need to have a
separate whitelisting you could try something like the following:

meta __TRUSTED_NETWORKS (NO_RELAYS || ALL_TRUSTED)
header __LOCAL_SENDER  From =~ /\...@mydomain\.com/i
meta   FORGED_LOCAL_SENDER (__LOCAL_SENDER && !__TRUSTED_NETWORKS)
score  FORGED_LOCAL_SENDER 0.1
meta   VALID_LOCAL_SENDER (__LOCAL_SENDER && __TRUSTED_NETWORKS)
score  VALID_LOCAL_SENDER -0.1

-jeff


   whitelist_from_rcvd  *...@mydomain.com mydomain.local
   trusted_networks 172.16.1/24 172.16.2/24 172.16.3/24 172.16.5/24 xx.xx.xx.xx
   internal_networks 172.16.1/24 172.16.2/24 172.16.3/24 172.16.5/24
   xx.xx.xx.xx
   
   ( xx.xx.xx.xx represents the outward facing IP of my mail server )
   
   Secondly, below is a header from a test email I sent to myself..
   
   Return-Path: 
   Received: by mydomain.com (CommuniGate Pro PIPE 5.2.12)
 with PIPE id 18275900; Wed, 28 Jul 2010 11:31:13 +0100
   X-TFF-CGPSA-Version: 1.5
   X-TFF-CGPSA-Filter: Scanned
   X-Spam-DCC: wuwien: mail.mydomain.com 1290; Body=1 Fuz1=2 Fuz2=6
   X-Spam-Checker-Version: SpamAssassin 3.2.5 ( 2008-06-10 ) on
mail.mydomain.com
   X-Spam-Level: ***
   X-Spam-Status: No, score=3.8 required=8.0
   tests=ALL_TRUSTED,FH_DATE_PAST_20XX,
HTML_IMAGE_ONLY_20,HTML_MESSAGE autolearn=no version=3.2.5
   X-Spam-Pyzor: 
   Received: from [172.16.3.150] (account some.user [172.16.3.150] verified)
 by mydomain.com (CommuniGate Pro SMTP 5.2.12)
 with ESMTPA id 18275888 for some.u...@mydomain.com; Wed, 28 Jul 2010
   11:31:04 +0100
   Message-ID: <4c500626.7010...@mydomain.com>
   Date: Wed, 28 Jul 2010 11:27:50 +0100
   From: Some User 
   User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
   MIME-Version: 1.0
   To: Some User 
   Subject: (no subject)
   Content-Type: multipart/alternative;
boundary="020906000403080006070205"
   X-EsetId: 90695D289D6435708F6F5D7C933375
   
   This is a multi-part message in MIME format.
   --020906000403080006070205
   Content-Type: text/plain; charset=ISO-8859-1; format=flowed
   Content-Transfer-Encoding: 7bit
   
   Couple of things to note , we use Active Directory which means the FQDN name
   of all our machines end in *.local rather than *.com. Should the
   whitelist_rcvd reflect this in any way??
   Its my understanding that all mails should get a Spam Assassin score of -100
   or thereabouts , thus permanently whitelisting all our domain users. However
   , as you can see this isn't happening??
   
   Is there anything else I should be doing to whitelist my domain users??
   
   
   Thanks in advance for all your help..
   Keith
   -- 
   View this message in context: 
http://old.nabble.com/Trouble-whitelisting-domain-users-with-whitelist_from_rcvd-tp29287372p29287372.html
   Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
   


Re: flat file bayes locking issue and difference errors depending on file locking method

2010-04-14 Thread Jeff Mincy
   From: "R-Elists" 
   Date: Wed, 14 Apr 2010 08:43:21 -0700
   
   having spent the better part of a two days searching as well as trying
   different configs and SA restarts

   we do not have a "hardware horsepower" resource starvation issue
   
   in reference to the error
   
   spamd[30339]: bayes: cannot open bayes databases
   /home/spamd/.spamassassin/bayes_* R/W: lock failed: Interrupted system call

I'd guess that you have a bayes expire running that is either taking
too long or not finishing and leaving lock files around.

Turn off bayes_auto_expire and use bayes_learn_to_journal.
Add a cron job to periodically sa-learn --sync (say hourly)
and another cron job to do sa-learn --force-expire (daily/weekly)
-jeff


Re: Limit SA to scan messages 100k and below

2010-03-31 Thread Jeff Mincy
   From: Keith De Souza 
   Date: Wed, 31 Mar 2010 14:10:50 +0100
   
   Hi
   
   *>> You need to change whatever glue you are using to pass messages to SA,
   >>and skip the scanning for messages larger than your desired threshold.
   
   *Sorry as I'm new to SA can you elaborated what you mean by glue?
   *
   >>That said, IMHO 100k is rather low. Why do you want that particular
   >>threshold?*
   
   Judging from your response, I may be wrong in what I need to do:
   
   Basically I'm having a few errors in my Exim logs from legitamate senders
   not coming through:

300 seconds looks like an timeout.   Something is giving up after
waiting 300 seconds.

Note the autolearn=unavailable.   I'd guess that you are getting
locked out from the Bayes database.   You probably had a Bayes expire
running at the same time.   There should be messages about this in a
log file.

If this is the case you can turn off bayes_auto_expire and run expire
from cron.  You could also try learning to the journal and doing
sa-learn --sync periodically from cron.

-jeff

   
   ===
   2010-03-31 01:22:25 1Nwlbc-0001QS-Ua H=
   host81-136-197-86.in-addr.btopenworld.com (mail.duke.tv) [81.136.197.86] F=<
   l...@dukeandearl.com> temporarily rejected after DATA
   ===
   
   And after checking my SA logs:
   
   ===
   Mar 31 01:25:51 mailserver spamd[5379]: spamd: result: . -4 -
   GENESIS_PHONENUMBER07 *scantime=300.0,size=24337*,
   
user=nobody,uid=8,required_score=3.2,rhost=localhost,raddr=127.0.0.1,rport=42308,mid=<
   c7d27527.8a78%l...@dukeandearl.com 
   >,autolearn=unavailable
   ==
   
   I'm trying to understand why is it taking 300.0 seconds to scan a message
   only 24Kb in size??
   I'm begeining to think that because SA is taking so long to scan the
   message, it is timing out
   and hence Exim returning a "temporarily reject after DATA".
   
   My thoughs so far is to perhaps reducing the file size that SA takes to scan
   and see if the scan time reduces.
   I may be wrong in my troublshooting methods but I'm not sure why this is
   happeninig at present.
   
   Many Thanks
   
   
   
   
   
   
   2010/3/31 Karsten Bräckelmann 
   
   > On Wed, 2010-03-31 at 13:24 +0100, Keith De Souza wrote:
   > > My current sysadmin has now left the company and I'm new to SA and
   > > Exim. [...]
   >
   > > I've read somewhere that the default setting for SA to scan a message
   > > is 500k.
   >
   > That's actually the default for spamc. Messages exceeding the threshold
   > just won't be passed to spamd. SA (and spamd) will check everything it
   > gets passed.
   >
   > > Can I reduce this, so that SA scans messages 100k and below?
   >
   > You need to change whatever glue you are using to pass messages to SA,
   > and skip the scanning for messages larger than your desired threshold.
   >
   > That said, IMHO 100k is rather low. Why do you want that particular
   > threshold?
   >
   >  guenther
   >
   >
   > --
   > char *t="\10pse\0r\0dtu...@ghno
   > \x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
   > main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i c<<=1:
   > (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0;
   > }}}
   >
   >


Re: Off Topic - SPF - What a Disaster

2010-02-23 Thread Jeff Mincy
   From: Martin Gregorie 
   Date: Tue, 23 Feb 2010 22:04:07 +
   
   On Tue, 2010-02-23 at 16:17 -0500, Bowie Bailey wrote:
   
   > The only exception is if you have a strict SPF policy for your own
   > domain, you can use it to reject spam pretending to be from your users.
   Agreed. That's all I use it for. 

The SPF checks in SpamAssassin will score SPF_FAIL without adding
enough points to block the email by itself.   I'm not ready to
outright block email that fail SPF.

   I installed SPF during a backscatter
   storm, which immediately decreased in volume. Since then the periodic
   backscatter showers have got steadily smaller, so it looks as though
   mailservers configured check SPF before bouncing undeliverable mail have
   been getting steadily more common. 
   
Either that or spammers tend to avoid forging domains that have SPF.

-jeff


Re: X-Relay-Countries can stick?

2010-02-12 Thread Jeff Mincy
   From: Robert Nicholson 
   Date: Fri, 12 Feb 2010 19:32:00 -0600
   
   Perhaps my confusion lies in the fact that it looks like headers != metadata?
   
   Is there a way or setting that allows metadata to result in headers in the 
message?

Did you try add_header?

ifplugin Mail::SpamAssassin::Plugin::RelayCountry
add_header all Relay-Country _RELAYCOUNTRY_
endif


Re: MTX plugin created (Re: Spam filtering similar to SPF, less breakage)

2010-02-11 Thread Jeff Mincy
   From: Charles Gregory 
   Date: Thu, 11 Feb 2010 11:55:10 -0500 (EST)
   
   On Wed, 10 Feb 2010, dar...@chaosreigns.com wrote:
   > http://www.chaosreigns.com/mtx/
   
   You know, just for a moment I thought I would take a look, just for 
   curiosity sake, and instead got this moronic jack-ass ATTITUDE page.

Heh.  Using IE 7.0 I get:

  Your browser cannot handle the 9 year old standard required by the
  web page you attempted to access. ...

IE 7.0 displays the page fine, but you have to save the file out as a
plain html file.

-jeff


Re: Rules for not passing SPF

2010-02-02 Thread Jeff Mincy
   From: dar...@chaosreigns.com
   Date: Tue, 2 Feb 2010 18:38:20 -0500
   
   On 02/02, Marc Perkel wrote:
   > Why would you want to catch domains without SPF as SPF has no  
   > relationship to detecting spam?
   
   SPF is entirely about spam.

Actually, SPF is about forgery and forgery is part of the spam problem.
You can still have genuine spam that passes SPF.  Messages that get
SPF_FAIL are forged spam and can be scored or blocked.

   http://www.openspf.org/Introduction
   
   If everyone uses SPF, all we need to block all spam is these rules
   (SPF_NOT_PASS alone should do it), and a blacklist of domains that have
   SPF records including IPs that send spam.

Good luck.   All you need is to get everybody to use SPF and then have
a very large blacklist of spam sending domains.
http://www.rhyolite.com/anti-spam/you-might-be.html
   
   SPF is easy, there's a wizard http://www.openspf.org/, then you paste
   the results into the DNS TXT record for your domain).

SPF is great for what it does.

-jeff


Re: How should this tricky spam be filtered?

2010-01-30 Thread Jeff Mincy
   From: Kārlis Repsons 
   Date: Sat, 30 Jan 2010 17:20:23 +
   
   On Saturday 30 January 2010 15:48:36 Jeff Mincy wrote:
   >  BAYES_99,DCC_CHECK,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_FIVETEN_SPAM,RCVD_IN_NIX
   > SPAM,RCVD_IN_UCEPROTECT1,RCVD_IN_UCEPROTECT2,RCVD_IN_UCEPROTECT3,BOTNET,BOT
   > NET_BADDNS
   > 
   > Botnet/FIVETEN/NIXSPAM/UCEPROTECT are additional rules added.
   > -jeff
   
   Thanks, just about DCC: why its said to be "not opensource" and commented 
out 
   in a spamassassin default config? Are there any closed-source binaries on a 
   client machine from it? Any such binaries related to SA exist?

DCC is a separately managed project with its own license.  DCC has to be
installed and configured (dccproc and dccifd) outside of SpamAssassin.
After DCC is installed then SpamAssassin has to be configured to use DCC
by loading the plugin.  You can install DCC from source or from various
repositories.   Same is true for razor and pyzor.
-jeff


Re: How should this tricky spam be filtered?

2010-01-30 Thread Jeff Mincy
   From: Ralph Bornefeld-Ettmann 
   Date: Sat, 30 Jan 2010 18:14:10 +0100
   
   Am 30.01.2010 16:48, schrieb Jeff Mincy:
   >From: Kārlis Repsons 
   >Date: Sat, 30 Jan 2010 14:07:16 +
   >
   >On Saturday 30 January 2010 13:54:14 Jeff Mincy wrote:
   >> Retrain the message correctly in Bayes.  Bayes will catch on to this
   >> after a few times.  The subject alone should be a strong enough clue
   >> for bayes (I get BAYES_80 on this partial sample), so it looks like
   >> you are doing only autolearn and not correcting messages that were
   >> learned incorrectly.
   >> -jeff
   >
   > I couldn't figure out how to get an unadulterated version of the
   > message from the spamalyser.com link you posted in a previous message.
   > I tried this
   >  wget -O - -q http://spamalyser.com/v/5cbffujq/original.txt
   > pastebin has a simple way to download the original.
   > Anyway, I eventually got something.

   in the "Raw Message" tab you can get the plain message
   (http://spamalyser.com/v/5cbffujq/raw)
   
Sorry.   Looks more like html here.

  % wget -O - -q  http://spamalyser.com/v/5cbffujq/raw | head
  http://www.w3.org/TR/html4/strict.dtd";>
  
  
  

To get the raw email message, I'd have to write something like 
  wget -O - -q http://spamalyser.com/v/5cbffujq/raw | w3m -dump -T text/html
followed by sed scripts to keep the lines with line numbers discard
the line numbers.

I guess http://spamalyser.com is looking at the User-Agent: Wget/1.10.2
header.

Maybe there could be a really-raw-without-line-numbers-and-no-html target.

-jeff


Re: How should this tricky spam be filtered?

2010-01-30 Thread Jeff Mincy
   From: Kārlis Repsons 
   Date: Sat, 30 Jan 2010 14:07:16 +
   
   On Saturday 30 January 2010 13:54:14 Jeff Mincy wrote:
   > Retrain the message correctly in Bayes.  Bayes will catch on to this
   > after a few times.  The subject alone should be a strong enough clue
   > for bayes (I get BAYES_80 on this partial sample), so it looks like
   > you are doing only autolearn and not correcting messages that were
   > learned incorrectly.
   > -jeff
   
I couldn't figure out how to get an unadulterated version of the
message from the spamalyser.com link you posted in a previous message.
I tried this
 wget -O - -q http://spamalyser.com/v/5cbffujq/original.txt
pastebin has a simple way to download the original.
Anyway, I eventually got something.

   Hmm, well, I just started with SA, so my filters aren't much trained yet. 
   The thing is, I didn't believe its the Bayes filter to be used for that 
case! 

Bayes is an incredible tool, but only if you let it.  The worst thing
you can do to bayes is mistrain it by learning spam messages has ham.
The other bad thing is to limit the number of messages that it learns from.

   Because I still think, that its not correct to train SA filter on that 
letter 
   as spam! It can contain words, which simply should not contribute to be more 
   "spam", no? Thats not a problem?

No, that is not a problem.
Yes, spam contains words, some of those words will also occur in ham.
Bayes will figure out which words are spammy and which are hammy and
which occur in both.

First start with training Bayes and then check if DCC and network
tests are enabled.

Anyway, I get the following.   
   
BAYES_99,DCC_CHECK,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_FIVETEN_SPAM,RCVD_IN_NIXSPAM,RCVD_IN_UCEPROTECT1,RCVD_IN_UCEPROTECT2,RCVD_IN_UCEPROTECT3,BOTNET,BOTNET_BADDNS

Botnet/FIVETEN/NIXSPAM/UCEPROTECT are additional rules added.

-jeff


Re: How should this tricky spam be filtered?

2010-01-30 Thread Jeff Mincy
   From: Kārlis Repsons 
   Date: Sat, 30 Jan 2010 13:35:26 +
   
   People,
   perhaps its simple to be done, but I personally would like to know the ways 
to 
   get rid of something like this:

Use pastebin and save the entire message including the headers instead
of forwarding messages like this.

   --  Forwarded Message  --
   ...
   ---
   
   Obviously, the only useful part of all that was the From: name field.

   SA gives just "X-Spam-Status: No, score=-0.7 required=4.0 tests=BAYES_20 
   autolearn=ham version=3.2.5-gr2".
   
   Hopefully a valid question here...

Retrain the message correctly in Bayes.  Bayes will catch on to this
after a few times.  The subject alone should be a strong enough clue
for bayes (I get BAYES_80 on this partial sample), so it looks like
you are doing only autolearn and not correcting messages that were
learned incorrectly.

-jeff


Re: About upgrading

2010-01-11 Thread Jeff Mincy
   From: Alex 
   Date: Sat, 9 Jan 2010 21:13:24 -0500
   
   >   sa-learn --dump magic gives:
   >       0.000          0          3          0  non-token data: bayes db 
version
   >       0.000          0      57538          0  non-token data: nspam
   >       0.000          0      74876          0  non-token data: nham
   >       0.000          0     166338          0  non-token data: ntokens
   >       0.000          0 1257478501          0  non-token data: oldest atime
   >       0.000          0 1263049426          0  non-token data: newest atime
   >       0.000          0 1263049538          0  non-token data: last journal 
sync atime
   >       0.000          0 1263044805          0  non-token data: last expiry 
atime
   >       0.000          0    5529600          0  non-token data: last expire 
atime delta
   >       0.000          0       1868          0  non-token data: last expire 
reduction count
   >
   > Your database has 166338 tokens which is larger than the default
   > bayes_expiry_max_db_size 15.  The last expiration ran this morning
   > at 8:46.  You could try letting the bayes database get larger and turn
   > off bayes_auto_expire.  If you turn off bayes_auto_expire you'll have
   > to add something to cron to periodically expire tokens.
   > bayes_auto_expire is fine for lower volumes of email, but can get in
   > the way with higher volumes.
   
   Also, what is the drawback with using auto_expire on larger volumes?
   Is it the locking delay and preventing learning new messages during
   that time? If you were to put it in cron to manually do an expiry, how
   often should it be run?
   
You have an exclusive lock when doing expiration.  Expiration presumably
takes longer on larger volumes, but it is still pretty fast.  
Running expiration daily or weekly should be more than sufficient.

   Is there anything that should be tested prior to making this change,
   or is it pretty benign?

Yes - turning off bayes_auto_expire is pretty benign.
You may not need to make this type of change.   The default options
for bayes work fine for lower email volumes.

   I suppose you could take the ntokens value before, and subtract it
   from the after value to see how many tokens were expired, right? It
   would be interesting to see how many tokens are expired on a regular
   basis, but not sure that's very useful, just interesting.

sa-learn tells how many tokens were deleted you when you do --force-expire, for 
example:
 expired old bayes database entries in 152 seconds
 1516428 entries kept, 115692 deleted
 token frequency: 1-occurrence tokens: 73.76%
 token frequency: less than 8 occurrences: 16.19%

-jeff


Re: About upgrading

2010-01-09 Thread Jeff Mincy
   From: Cecil Westerhof 
   Date: Sat, 09 Jan 2010 16:24:56 +0100
   
   Jeff Mincy  writes:
   
   >I upgraded from 3.0.4 to 3.2.5. I have the feeling that sa-learn takes
   >more time with 3.2.5 as it took with 3.0.4. Can this be true?
   >
   >It is not a problem, because it is done by cron-tab, but I am just
   >curious.
   >
   > You can use spamc -L spam/ham to learn messages.  Spamc -L is faster
   > than sa-learn.  The spamd daemon needs to be started with
   > --allow-tell.
   
   That is not really an answer on my question. ;-)

I doubt that bayes learning has slowed down significantly.
I would expect that choice of bayes_store_module, learning to
journal, whether auto expiration runs, and lock contention
matters more than the version.

   But it does not seem to be interesting in my situation.
   First my code has to grow from:
   sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/
   to:
   for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do
   spamc -L ${typeStr} <${i}
   done
   
   Which is not even enough, because I need to take care of the situation
   that the directory is empty and I need to implement code to show the
   messages delivered by sa-learn.

Oh.  You're learning all of the messages in a directory.  spamc -L is
faster than sa-learn for learning single messages because sa-learn is
a perl script that has to load Mail::SpamAssassin each time.  For a
large directory the slower startup of sa-learn is less of an issue.
sa-learn is fine for doing directories.

   Which a low level of spam it work, but if it becomes bigger, it does not
   work:
   date
   echo ${echoStr}
   sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/
   date
   for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do
   spamc -L ${typeStr} <${i}
   done
   echo learned in the new way
   date
   gives:
   za jan  9 16:09:25 CET 2010
   Increase
   Learned tokens from 0 message(s) (45 message(s) examined)
   za jan  9 16:09:40 CET 2010
   learned in the new way
   za jan  9 16:10:00 CET 2010
   
   So sa-learn takes 15 seconds and spamc -L 20 seconds. (And I need more
   code. Beside taking care of an empty directory, I also need to implement
   the feedback given by sa-learn.)
   
You learned tokens from 0 messages and looked at 45 messages.
You've already previously learned from those 45 messages, which is
just timing how fast it can do nothing.

   > You can try using bayes_learn_to_journal - and do a separate sa-learn
   > --sync job in cron.   Learning to the journal is faster.
   
   I'll look into that.
   
   
   > Also, What is the size of your database?   Maybe you are spending lots
   > of time doing expires or something.
   
   sa-learn --dump magic gives:
   0.000  0  3  0  non-token data: bayes db version
   0.000  0  57538  0  non-token data: nspam
   0.000  0  74876  0  non-token data: nham
   0.000  0 166338  0  non-token data: ntokens
   0.000  0 1257478501  0  non-token data: oldest atime
   0.000  0 1263049426  0  non-token data: newest atime
   0.000  0 1263049538  0  non-token data: last journal 
sync atime
   0.000  0 1263044805  0  non-token data: last expiry atime
   0.000  05529600  0  non-token data: last expire 
atime delta
   0.000  0   1868  0  non-token data: last expire 
reduction count
   
Your database has 166338 tokens which is larger than the default
bayes_expiry_max_db_size 15.  The last expiration ran this morning
at 8:46.  You could try letting the bayes database get larger and turn
off bayes_auto_expire.  If you turn off bayes_auto_expire you'll have
to add something to cron to periodically expire tokens.
bayes_auto_expire is fine for lower volumes of email, but can get in
the way with higher volumes.
-jeff


Re: About upgrading

2010-01-09 Thread Jeff Mincy
   From: Cecil Westerhof 
   Date: Sat, 09 Jan 2010 14:39:59 +0100
   
   Cecil Westerhof  writes:
   
   > I did the upgrade. It took some time and there was a slight problem with
   > permissions, but it looks like a successful upgrade. I only changed
   > /dev/null to a real mailbox, because of the 2010 problem. When something
   > like this happens again I now can recover those e-mails.
   
   I upgraded from 3.0.4 to 3.2.5. I have the feeling that sa-learn takes
   more time with 3.2.5 as it took with 3.0.4. Can this be true?
   
   It is not a problem, because it is done by cron-tab, but I am just
   curious.

You can use spamc -L spam/ham to learn messages.  Spamc -L is faster
than sa-learn.  The spamd daemon needs to be started with --allow-tell.

You can try using bayes_learn_to_journal - and do a separate sa-learn
--sync job in cron.   Learning to the journal is faster.

Also, What is the size of your database?   Maybe you are spending lots
of time doing expires or something.

-jeff


RE: [sa] Re: FH_DATE_PAST_20XX

2010-01-02 Thread Jeff Mincy
   From: "R-Elists" 
   Date: Sat, 2 Jan 2010 08:33:42 -0800
   
   > > 
   > /20[1-9][0-9]/   --> /20[2-9][0-9]/
   >

   we changed it to this before the update and still had the issue.
   
   so we changed back to the older version and then zero'd the score.
   
   waitied for the update
   
   after the update, changed the score to a small positive value to re-enable
   yet the rule is still *hitting* for some reason...
   
   since it is a header rule, what should i start looking at to see where the
   issue is coming from?
   
   somewhere in SA? should i enable special logging?
   
   or, should i check the MTA and it's assigns that deal with the header?

The rule is probably also defined in some other file.
Are you using 00_FVGT_File001.cf?  If so check there.

-jeff


RE: [sa] Re: FH_DATE_PAST_20XX

2010-01-01 Thread Jeff Mincy
   From: "R-Elists" 
   Date: Fri, 1 Jan 2010 15:48:13 -0800
   
   > Cc: Spamassassin users list
   > Subject: Re: [sa] Re: FH_DATE_PAST_20XX
   > 
   > Damn -- mea culpa.  When we fixed the bug in SVN trunk in bug 
   > 5852, I should have immediately backported it to the 3.2.x 
   > sa-update channel when I commited that patch, but I didn't.
   > 
   > It's now fixed in updates, but that won't help the admins 
   > who've been paged to deal with high FP rates on a holiday.  
   > :(  Sorry folks...
   > 
   > --j.
   
   what should the new rule look like?
   
   i mean, i get it, and i think i know, and i even tested it and it was still
   failing even after a restarts...
   
   s...

   seriously, i disabled the rule early AM yet when the update came through 4
   or so hours later, i believe it looks exactly the same as when i first
   viewed it early on...

The easiest way to see what is being changed since your last sa-update
is to first sa-update /tmp and diff.  The change is trivial but significant...

   root% sa-update -D --updatedir /tmp/updates
   root% diff -r -U 0 /var/lib/spamassassin/3.002005/updates_spamassassin_org 
/tmp/updates/updates_spamassassin_org

   diff -u -w --minimal -r -U 0 
/var/lib/spamassassin/3.002005/updates_spamassassin_org/72_active.cf 
/tmp/updates/updates_spamassassin_org/72_active.cf
   --- /var/lib/spamassassin/3.002005/updates_spamassassin_org/72_active.cf 
2009-07-20 17:01:55.0 -0400
   +++ /tmp/updates/updates_spamassassin_org/72_active.cf   2010-01-01 
18:51:10.0 -0500
   @@ -527,7 +527,7 @@
##{ FH_DATE_PAST_20XX
   -header   FH_DATE_PAST_20XX  Date =~ /20[1-9][0-9]/ [if-unset: 2006]
   +header   FH_DATE_PAST_20XX  Date =~ /20[2-9][0-9]/ [if-unset: 2006]
describe FH_DATE_PAST_20XX  The date is grossly in the future.
##} FH_DATE_PAST_20XX


-jeff


Re: dkim whitelisting

2009-12-16 Thread Jeff Mincy
   From: LuKreme 
   Date: Wed, 16 Dec 2009 08:23:23 -0700

   I'm adding address book users into the user_prefs files, but without
   the signing domain this is useless and emails for my users are still
   getting tagged up as spam (these in particular score 7-10 points
   without the whitelist). Is there a better way, or do I just have to
   go in and find a DKIM-Signature for each address book entry and then
   parse out the d= field?
   
Yes, you need the d= part.  Note You should only do this for messages
from domains that are signed and pass DKIM with DKIM_VERIFIED.  Adding
whitelist_From_dkim won't do any good if you don't have DKIM_SIGNED
and DKIM_VERIFIED.

   grep -r "^DKIM-Signature:" $HOME/Maildir | awk  '{print $4}' | sed 's/d=//' 
| sed 's/;//' | sort -u
   
   I dunno, doesn't seem that efficient (oh, and it doesn't work since the d= 
doesn't appear in the same location in all the headers).
   
If you are going to use sed, You need the entire DKIM_Signature header
as one line.  Use formail to extract the header, for example
  formail -c -x DKIM-Signature:


NAME
   formail - mail (re)formatter

...
   -c   Concatenate continued fields in the header.  Might  be  convenient
when postprocessing mail with standard (line oriented) text utili-
ties.
-jeff


Re: HABEAS_ACCREDITED SPAMMER

2009-11-24 Thread Jeff Mincy
   From: LuKreme 
   Date: Mon, 23 Nov 2009 17:08:11 -0700
   
   On Nov 23, 2009, at 7:39, Matus UHLAR - fantomas   
   wrote:
   
   > Yes, why to differ between non-abusing and abusing marketers...
   
   We've been through this before. On my mail, habeas is a very strong  
   indicator of spam. It does not appear in legitimate mail.
   
I find it a little hard to believe that your spam is so much different from
my spam.  On my mail, not one single spam message (out of 228k total) hit
HABEAS for all of 2009.  The few messages (480 out of 11k) that hit HABEAS
were all ham, either professional organizations/newsletters, transactions
from places like Vanguard or retail stores that I have a relationship with.

   I don't know who these legitimate marketers are, but I don't feel I'm  
   missing anything.
   
You WILL 'block' legitimate mail.  However, It's your email, so you
can do anything you want.  If you think HABEAS is so bad just set the
HABEAS scores to zero and save the network bandwidth.

-jeff


Re: Timeouts: pyzor and razor2

2009-11-09 Thread Jeff Mincy
   From: Art Greenberg 
   Date: Mon, 9 Nov 2009 17:58:48 -0500 (EST)
   
   Lately I'm seeing a fairly consistent timeout for checks sent to pyzor and 
   razor2 by SA. Up until a couple of days ago this was a very rare 
   concurrence. Seems odd that both of these would have this trouble at the 
   same time. Has anyone else noticed this? Perhaps I changed something here 
   that is causing it 

Pyzor is currently timing out:
  % /usr/bin/pyzor ping
   public.pyzor.org:24441   TimeoutError: 

Razor is fine
You can increase the timeout if razor is running slow:
   ifplugin Mail::SpamAssassin::Plugin::Razor2
   # How many seconds you wait for razor to complete before you go on without 
the results
   razor_timeout 15
   endif

-jeff


Re: Another dcc question

2009-10-13 Thread Jeff Mincy
   From: Rick Knight 
   Date: Tue, 13 Oct 2009 09:42:18 -0700
   
   Jeff Mincy wrote:
   >From: Rick Knight 
   >Date: Tue, 13 Oct 2009 08:53:21 -0700
   >
   >Just following this thread because I recently got dcc working also. In 
   >my case I didn't have dcc installed. After installing dcc everything  
   >seems to be working but now I'm wondering about dccifd. On my system 
   >dccproc is in /usr/local/bin but dccifd is in /var/dcc/libexec/. I also 
   >have start-dccifd in /var/dcc/libexec. I assume I need to add 
   >dcc_dccifd_path to my local.cf and then run start-dccifd before 
starting 
   >spamassassin. Is that correct?
   >
   > Run spamassassin  --test-mode.   If spamassassin finds dccifd it will
   > say 'dccifd is available':
   >
   >   % spamassassin --test-mode --debug dcc < MESSAGE 2>&1 | fgrep dccifd
   >   134:[14145] dbg: dcc: dccifd is available: /var/lib/dcc/dccifd
   >   135:[14145] dbg: dcc: dccifd got response: X-DCC-sonic.net-Metrics: 
pinky 1156; bulk Body=1 Fuz1=many Fuz2=many
   >
   > If you get 'dccifd is not available:
   >   ... dbg: dcc: dccifd is not available: no r/w dccifd socket found
   >
   > then you need to use dcc_dccifd_path or dcc_home
   > -jeff
   >   
   Thanks Jeff,
   
   When I run test-mode I just get this
   
   bash: MESSAGE: No such file or or directory
   
   I'm sure I'm just useing the command wrong.

create a file called MESSAGE that contains a complete spam message
with full headers.


Re: Another dcc question

2009-10-13 Thread Jeff Mincy
   From: Rick Knight 
   Date: Tue, 13 Oct 2009 08:53:21 -0700
   
   Just following this thread because I recently got dcc working also. In 
   my case I didn't have dcc installed. After installing dcc everything  
   seems to be working but now I'm wondering about dccifd. On my system 
   dccproc is in /usr/local/bin but dccifd is in /var/dcc/libexec/. I also 
   have start-dccifd in /var/dcc/libexec. I assume I need to add 
   dcc_dccifd_path to my local.cf and then run start-dccifd before starting 
   spamassassin. Is that correct?
   
Run spamassassin  --test-mode.   If spamassassin finds dccifd it will
say 'dccifd is available':

  % spamassassin --test-mode --debug dcc < MESSAGE 2>&1 | fgrep dccifd
  134:[14145] dbg: dcc: dccifd is available: /var/lib/dcc/dccifd
  135:[14145] dbg: dcc: dccifd got response: X-DCC-sonic.net-Metrics: pinky 
1156; bulk Body=1 Fuz1=many Fuz2=many

If you get 'dccifd is not available:
  ... dbg: dcc: dccifd is not available: no r/w dccifd socket found

then you need to use dcc_dccifd_path or dcc_home
-jeff


Re: just enabled DCC

2009-10-13 Thread Jeff Mincy
   From: Dan Schaefer 
   Date: Tue, 13 Oct 2009 10:17:43 -0400
   
   Jeff Mincy wrote:
   >From: Dan Schaefer 
   >Date: Tue, 13 Oct 2009 09:18:44 -0400
   >
   >    Jeff Mincy wrote:
   >>From: Dan Schaefer 
   >>Date: Tue, 13 Oct 2009 08:54:29 -0400
   >>
   >>Jason Bertoch wrote:
   >>> Dan Schaefer wrote:
   >>>> I just enabled DCC yesterday and everything appears to be 
working 
   >>>> (DCC is registered).  Just to make sure, can someone post an 
email to 
   >>>> pastebin that has a DCC hit? Thanks.
   >>>>
   >>> IIRC, a message with "test" in the subject and body will match, 
   >>> although your logs should tell you what rules are hitting anyway.
   >>
   >>Is DCC_CHECK the only DCC rule? Because I didn't find that in my 
logs 
   >>yesterday. "test" in the subject and "test" in the body only 
triggered 
   >>TVD_SPACE_RATIO and BAYES_00 from my personal email address to my 
work 
   >>address. Any other suggestions?
   >>
   >> Use
   >>spamassassin --test-mode --debug dcc < somespammsg
   >>
   >> Should print out stuff like:
   >>
   >>08:58:51.617 0.375 0.375 [28903] dbg: dcc: network tests on, 
registering DCC
   >>08:58:54.405 3.164 0.943 [28903] dbg: dcc: dccifd is available: 
/var/lib/dcc/dccifd
   >>08:58:54.585 3.343 0.179 [28903] dbg: dcc: dccifd got response: 
X-DCC--Metrics: pinky 1356; bulk Body=3 Fuz1=4384 Fuz2=many
   >>08:58:54.585 3.343 0.000 [28903] dbg: dcc: listed: BODY=3/20 
FUZ1=4384/20 FUZ2=99/20
   >>
   >>
   >> -jeff
   >>   
   >I followed your instructions and received the following:
   >
   >[1486] dbg: dcc: network tests on, registering DCC
   >[1486] dbg: dcc: dccifd is not available: no r/w dccifd socket found
   >[1486] dbg: dcc: dccproc is not available: no dccproc executable found
   >[1486] dbg: dcc: dccifd and dccproc are not available, disabling DCC
   >
   >After seeing that, I NAT-ed 1023 local to 6277 remote and 6277 remote 
to 
   >1023 to my mail server in my firewall. I ran the test again and 
received 
   >the same message.
   >
   > Your firewall is not the problem shown here.  SpamAssassin can't find
   > the dcc socket and executable.  Do you have DCC installed?  If so,
   > where is the dccproc executable?  Did you start dccifd?  Where is the
   > dccifd socket?  SpamAssassin needs to know where they are.  You can
   > use various configuration options to tell SpamAssassin where to look,
   > for example:
   >   ## DCC options (Admin only)
   >   dcc_home /var/lib/dcc
   >   dcc_dccifd_path /var/lib/dcc/dccifd
   >   dcc_path /usr/bin/dccproc
   >
   > -jeff
   >   
   I did just install DCC, but I don't know if it is installed correctly. 
   And of course, DCC's website is down 
   (http://www.rhyolite.com/anti-spam/dcc/). I used the instructions here 
   instead: http://www.freespamfilter.org/FC4.html#_Toc110999211
   
   Now when I run:
   spamassassin -t -D dcc < spam_message
   I get:
   [2955] dbg: dcc: network tests on, registering DCC
   [2955] dbg: dcc: dccifd is not available: no r/w dccifd socket found
   [2955] dbg: dcc: dccproc is available: /usr/bin/dccproc
   [2955] dbg: dcc: opening pipe: /usr/bin/dccproc -H -x 0 -a 74.86.146.6 < 
   /tmp/.spamassassin2955q6p1Yatmp
   [2955] dbg: dcc: got response: X-DCC-SIHOPE-DCC-3-Metrics: 
   pony.performanceadmin.com 1085; Body=2 Fuz1=2 Fuz2=many
   
   and
   2.2 DCC_CHECK  Listed in DCC 
   (http://rhyolite.com/anti-spam/dcc/)
   in the report
   
   Even though the dccfid socket cannot be found, does this appear to be 
   working correctly?

Yes dccproc is working.  You got a hit on DCC_CHECK.  

You should use dccifd if possible.  It is faster.

-jeff


Re: just enabled DCC

2009-10-13 Thread Jeff Mincy
   From: Dan Schaefer 
   Date: Tue, 13 Oct 2009 09:18:44 -0400
   
   Jeff Mincy wrote:
   >From: Dan Schaefer 
   >Date: Tue, 13 Oct 2009 08:54:29 -0400
   >
   >Jason Bertoch wrote:
   >> Dan Schaefer wrote:
   >>> I just enabled DCC yesterday and everything appears to be working 
   >>> (DCC is registered).  Just to make sure, can someone post an email 
to 
   >>> pastebin that has a DCC hit? Thanks.
   >>>
   >> IIRC, a message with "test" in the subject and body will match, 
   >> although your logs should tell you what rules are hitting anyway.
   >
   >Is DCC_CHECK the only DCC rule? Because I didn't find that in my logs 
   >yesterday. "test" in the subject and "test" in the body only triggered 
   >TVD_SPACE_RATIO and BAYES_00 from my personal email address to my work 
   >address. Any other suggestions?
   >
   > Use
   >spamassassin --test-mode --debug dcc < somespammsg
   >
   > Should print out stuff like:
   >
   >08:58:51.617 0.375 0.375 [28903] dbg: dcc: network tests on, 
registering DCC
   >08:58:54.405 3.164 0.943 [28903] dbg: dcc: dccifd is available: 
/var/lib/dcc/dccifd
   >08:58:54.585 3.343 0.179 [28903] dbg: dcc: dccifd got response: 
X-DCC--Metrics: pinky 1356; bulk Body=3 Fuz1=4384 Fuz2=many
   >08:58:54.585 3.343 0.000 [28903] dbg: dcc: listed: BODY=3/20 
FUZ1=4384/20 FUZ2=99/20
   >
   >
   > -jeff
   >   
   I followed your instructions and received the following:
   
   [1486] dbg: dcc: network tests on, registering DCC
   [1486] dbg: dcc: dccifd is not available: no r/w dccifd socket found
   [1486] dbg: dcc: dccproc is not available: no dccproc executable found
   [1486] dbg: dcc: dccifd and dccproc are not available, disabling DCC
   
   After seeing that, I NAT-ed 1023 local to 6277 remote and 6277 remote to 
   1023 to my mail server in my firewall. I ran the test again and received 
   the same message.

Your firewall is not the problem shown here.  SpamAssassin can't find
the dcc socket and executable.  Do you have DCC installed?  If so,
where is the dccproc executable?  Did you start dccifd?  Where is the
dccifd socket?  SpamAssassin needs to know where they are.  You can
use various configuration options to tell SpamAssassin where to look,
for example:
  ## DCC options (Admin only)
  dcc_home /var/lib/dcc
  dcc_dccifd_path /var/lib/dcc/dccifd
  dcc_path /usr/bin/dccproc

-jeff


Re: just enabled DCC

2009-10-13 Thread Jeff Mincy
   From: Dan Schaefer 
   Date: Tue, 13 Oct 2009 08:54:29 -0400
   
   Jason Bertoch wrote:
   > Dan Schaefer wrote:
   >> I just enabled DCC yesterday and everything appears to be working 
   >> (DCC is registered).  Just to make sure, can someone post an email to 
   >> pastebin that has a DCC hit? Thanks.
   >>
   > IIRC, a message with "test" in the subject and body will match, 
   > although your logs should tell you what rules are hitting anyway.
   
   Is DCC_CHECK the only DCC rule? Because I didn't find that in my logs 
   yesterday. "test" in the subject and "test" in the body only triggered 
   TVD_SPACE_RATIO and BAYES_00 from my personal email address to my work 
   address. Any other suggestions?
   
Use
   spamassassin --test-mode --debug dcc < somespammsg

Should print out stuff like:

   08:58:51.617 0.375 0.375 [28903] dbg: dcc: network tests on, registering DCC
   08:58:54.405 3.164 0.943 [28903] dbg: dcc: dccifd is available: 
/var/lib/dcc/dccifd
   08:58:54.585 3.343 0.179 [28903] dbg: dcc: dccifd got response: 
X-DCC--Metrics: pinky 1356; bulk Body=3 Fuz1=4384 Fuz2=many
   08:58:54.585 3.343 0.000 [28903] dbg: dcc: listed: BODY=3/20 FUZ1=4384/20 
FUZ2=99/20


-jeff


Re: Incresing numbers of DCC_CHECK in ham

2009-10-09 Thread Jeff Mincy
   From: "Jari Fredriksson" 
   Date: Fri, 9 Oct 2009 20:44:09 +0300
   
   > DCC identifies mail that has been sent often. That's what
   > the rule checks for, if other recipients have seen it,
   > too. 
   > 
   > You voluntarily installed DCC, knowing SA will use it.
   > This was on your discretion, and it's your duty to
   > evaluate if it actually is, what you want.
   > 
   > [1] Once, mind you. Which is what DCC does, counting. The
   >"report spam" option in SA reports it differently as
   > many. 
   
   1. So what is DCC good for?

DCC is extremely good at detecting bulk messages.  All or nearly all
spam messages are bulk.

   2. Why does SpamAssassin use it?
   
DCC is a separately configured plugin that does not run unless
configured to do so at each SpamAssassin site.

   3. Should I uninstall DCC if I want to get bulk but not Spam?
   
You should whitelist legitimate bulk email in the DCC whiteclnt file.
Or you could bypass SpamAssassin for mailing lists.  You could lower
the DCC_CHECK score.   Or you could disable or uninstall DCC.

   4. Question 2. again. SpamAssassin is about Spam, but I really need
  to receive bulk, as in mailing lists and newspaper posts. Are
  there people do not want any mail but what their friends send
  them, and that is the purpose of DCC?

If you use DCC you have to whitelist legitimate sources of bulk email.

   5. What special does the "Report to DCC" SpamAssassin function do for our 
good?
   
Using "Report to DCC" reports the message to DCC with a count of many.
After that everybody else querying the same message will get a count
of many.

-jeff


Re: Incresing numbers of DCC_CHECK in ham

2009-10-09 Thread Jeff Mincy
   From: "Jari Fredriksson" 
   Date: Fri, 9 Oct 2009 19:25:15 +0300
   
   >   Is someone trying to poison DCC?
   > 
   > Yes, you are(:-)   If you haven't whitelisted the
   > mailing list then 
   > you are reporting the email from the mailing list to DCC,
   > which will 
   > increase the DCC count.
   
   Me? But I do report to DCC/Razor2/SpamCop only spam. I do not report ALL my 
email.

Using spamassassin --report reports the spam message to dcc with a -t
target count of many

   How does DCC actually work? Is any query a report somehow for DCC?

If you ask the DCC network you are reporting it.
>From the dccproc man page.
 -Q   only queries the DCC server about the checksums of messages instead
  of reporting and then querying.  This is useful when dccproc is used
  to filter mail that has already been reported to a DCC server by
  another DCC client such as dccm(8).  This can also be useful when
  applying a private white or black list to mail that has already been
  reported to a DCC server.  No single mail message should be reported
  to a DCC server more than once per recipient, such as would happen
  if dccproc is not given -Q when processing a stream of mail that has
  already been seen by a DCC client.  Additional reports of a message
  increase its apparent "bulkness."
-jeff


Re: Incresing numbers of DCC_CHECK in ham

2009-10-09 Thread Jeff Mincy
   From: "Jari Fredriksson" 
   Date: Fri, 9 Oct 2009 17:58:06 +0300
   
   This looks worrying. I have it at 2.2 pts, and not caused any false
   positives, but still, odd. Or is it? I know it is a SPAM indicator
   but a bulk indicator.

Auto correct: That should be 'I know it is *not* a spam indicator but a bulk 
indicator.'

Yes - it indicates bulk.  Lots of people have seen the email message.
DCC will hit spam, mailing lists, and retail email such as amazon, and
various extremely short email messages.
   
   But it is triggered for example by some mailing list posts which are genuine 
and not bulk.

What is a genuine mailing list post that is not bulk?  If lots of
people are on the mailing list then the message is, by definition, bulk.
   
   Is someone trying to poison DCC?

Yes, you are(:-)   If you haven't whitelisted the mailing list then
you are reporting the email from the mailing list to DCC, which will
increase the DCC count.   Eventually somebody will report the mailing
list as spam to DCC and you will get a DCC match on the default
many=99.

You have to whitelist the mailing list in the dcc whiteclnt file.

-jeff


Re: Problems with whitelist_from_rcvd

2009-10-02 Thread Jeff Mincy
   From: Igor Bogomazov 
   Date: Fri, 2 Oct 2009 12:34:55 +0400
   
   When I add the string like:
   whitelist_from s...@domain.mail
   it works OK.
   
   But:
   whitelist_from_rcvd s...@domain.mail prefix.domain.mail
   doesn't work.
   
   I've checked rDNS of the prefix.domain.mail with 'host' utility - it's
   all right.
   
   And the appropriate mail header seems to be correct:
   Received: from prefix.domain.mail (unknown [12.12.12.12])
   
   What's the matter?

It is hard to say for sure without seeing actual received headers.

You need to use the last external relay used by the email.

>From man Mail::SpamAssassin::Conf. 

   whitelist_from_rcvd ...

   This string is matched against the reverse DNS lookup used during
   the handover from the internet to your internal network's mail
   exchangers.  It can either be the full hostname, or the domain
   component of that hostname.  ...

The easiest way to figure out which one to use is to add a Relay
header using:
   add_header all Relay trusted=_RELAYSTRUSTED_, untrusted=_RELAYSUNTRUSTED_

Then get the RDNS from the first untrusted=[ip=... rdns=RDNS ...] relay.
If the RDNS is blank then the whitelist_from_rcvd won't work.

Your internal_networks and trusted_networks needs to be setup correctly.

-jeff


Re: Re-running SA on an mbox

2009-09-22 Thread Jeff Mincy
   From: MySQL Student 
   Date: Tue, 22 Sep 2009 15:38:47 -0400
   
   > Try using a local SA setup for stripping the headers. By local, I mean
   > don't use your main production SA - run a separate copy with its own
   > (cut down) configuration and all data base accesses and UBL calls etc
   > turned off.
   
   Much better idea, thanks. Thanks for the script, too.
   Alex

formail can be used to remove headers, for example:

   To remove all Received: fields from the header:
  formail -I Received:

The following should do what you wanted to remove the X-Spam headers:
  formail -I X-Spam < msg

-jeff


Re: Problem with whitelist_from_rcvd and forged reverse lookup

2009-07-30 Thread Jeff Mincy
   From: Sebastian Wiesinger 
   Date: Thu, 30 Jul 2009 17:48:09 +0200
   
   * John Hardin  [2009-07-30 17:39]:
   >> Sendmail -> Procmail -> SA (spamc)
   >
   > Cool, that should be simple.
   >
   > Can you send:
   >
   > (1) the Received: headers from an email generated on that box, and
   >
   > (2) the procmail stanza where you call SA?
   
   I could create a procmail rule that excludes local mail from SA, but I
   would much rather like to whitelist this in spamassassin. Nevertheless
   thanks for your offer to help with procmail.
   
Processing locally generated email that contain spam URLs through
SpamAssassin is not a particularly good idea.  If you have Bayes
enabled then you are training your Bayes that spam URLs and whatever
else is in the log files are hammy tokens.

You really do want to skip SpamAssassin processing on messages like
this in your procmail.

-jeff


Re: Pyzor or DCC

2009-07-23 Thread Jeff Mincy
   From: Jonas Eckerman 
   Date: Thu, 23 Jul 2009 15:37:11 +0200
   
   Michael Hutchinson wrote:
   
   >> I saw a test
   >> message with just the word test in the subject hit DCC once.
   
   > That's really strange, I don't see how DCC would fire on the subject..
   > the checksum of the message must have somehow matched some Spam.. 
   
   That's perfectly normal. DCC doen't just match spam, it matches things 
   that has been seen before. That means it matches bulk, but also anything 
   that happens to be very common for other reasons.

yep.
   
   I imagine that an empty message with the subject "test" is pretty 
   common, so it's perfectly reasonable for DCC to have seen such messages 
   many times before.
   
   I don't know if DCC cares about the subject att all. If it doesn't, it's 
   even more liekey that it would hit on an empty test message.
   
   /Jonas

DCC does hit on empty messages.   The empty messages can be
whitelisted.   The DCC distribution includes a fetch-testmsg-whitelist
script:

% head /usr/src/dcc-1.3.111/misc/fetch-testmsg-whitelist
#!/bin/sh

# Fetch a list of "empty" mail messages for whitelisting.  Many free mail
#   service providers add HTML or other text to mail.  That causes empty
#   and nearly empty mail messages to have valid DCC checksums and not be
#   ignored by DCC clients.

# The fetched file can be included in whiteclnt files.  For example, the
#   following line in /var/dccwhiteclnt would whitelist many common
#   empty messages


Re: Pyzor or DCC

2009-07-22 Thread Jeff Mincy
   From: RW 
   Date: Wed, 22 Jul 2009 03:45:50 +0100
   
   On Wed, 22 Jul 2009 13:42:52 +1200
   "Michael Hutchinson"  wrote:
   
   > If you get an E-Mail scoring in both Pyzor and DCC, the chances are
   > very high that the message is Spam. We only deal with around 90,000
   > incoming delivery attempts per day - but have not had a false
   > positive from Pyzor or DCC yet, and have been using both for some
   > years.
   
   That's odd, I get quite a lot of DCC FPs and a few Pyzor FPs on a
   relatively small amount of email. They tend to hit on bulk mail, like
   newsletters, automated mail and very generic mails. I saw a test
   message with just the word test in the subject hit DCC once. 

DCC identifies 'bulk' email.  You have to whitelist desired bulk
email senders in the DCC whiteclnt (etc) file.  The DCC distribution
includes sample scripts like edit-whiteclnt.
   
Pyzor and Razor are easier to use because of the whitelisting.
Razor and DCC are both highly effective (>80%), and Pyzor is good (>40%).

-jeff


Re: Underscores

2009-07-16 Thread Jeff Mincy
   From: Matt Kettler 
   Date: Thu, 16 Jul 2009 08:52:50 -0400
   
   twofers wrote:
   > How can I pattern match when every word has an underscore after it.
   > Example:
   > This_sentenance_has_an_underscore_after_every_word
   >
   > I'm not really good at Perl pattern matching, but \w and \W see an
   > underscore as a word character, so I'm just not sure what might work.
   >
   > body =~ /^([a-z]+_+)+/i
   >
   > Is that something that will work effectively?

Is this for a spam rule?

   I'd do something like this:
   
   body  MY_UNDERSCORES/\S+_+\S+_+\S+/
   
   Unless you really want to restrict it to A-Z.
   
   Regardless, ending any regex in + in a SA rule is redundant. Since +
   allows a one-instance match, it will devolve to that. You don't need to
   match the entire line with your rule, so the extra matches are
   redundant. It will match the first instance, and that's all it needs to
   be a match.
   
   Also any regex ending in * should just have it's last element removed,
   as that will devolve to a zero-count match.

The /\S+_+\S+_+\S+/ rule will lots of technical email, for example
discussions on shell environment variables like LD_LIBRARY_PATH.

-jeff


Re: rbl/dnsbl seems to use wrong ip sometimes

2009-07-11 Thread Jeff Mincy
   From: dmy 
   Date: Sat, 11 Jul 2009 14:27:34 -0700 (PDT)
   
   So is there a way to configure that ALL DNS tests just use the last external
   ip address (or at least NOT the first one?). Because to me it doesn't make
   any sense to test the ip people use to deliver messages to their smarthost
   and it produces quite a few false positives on my system...

The smarthost presumably requires authenticated senders.
The smarthost should then add a Received: header that shows that the
sender was authenticated (eg ESMTPSA).   If the smarthost is trusted
then the sender will be trusted.   Various tests are not run on
trusted hosts.
-jeff


   RW-15 wrote:
   > On Sat, 11 Jul 2009 12:52:56 -0700 (PDT)
   > dmy  wrote:
   > 
   >> As far as I understand SpamAssassin is supposed to just check the ip
   >> that directly delivered the email to my server but not the IP the
   >> email is originally from (as that woundn't make any sense as almost
   >> everyone is using dyn ips...). 
   > 
   > It depends on the test. Most of them run on all addresses outside the
   > trusted network, except for DUL tests and Spamhaus PBL + XBL which run
   > on the last external.

   -- 
   View this message in context: 
http://www.nabble.com/rbl-dnsbl-seems-to-use-wrong-ip-sometimes-tp24443359p2012.html
   Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
   


Re: USER_IN_WHITELIST Not Scoring

2009-07-10 Thread Jeff Mincy
   From: Karsten Bräckelmann 
   Date: Fri, 10 Jul 2009 23:43:03 +0200
   
   On Fri, 2009-07-10 at 06:53 -0700, an anonymous Nabble user wrote:
   > My local root user sends me nightly emails with mail/spam statistics and
   > information.  Because of the spam information contained in the email, it
   > sometimes flagged as spam itself.
   > 
   > In my local.cf, I have put the root user's email address in the
   > whitelist_from line, however whenever I send an email as the root user to 
my
   > legitimate email account, it is not getting scored.
   
 whitelist_from r...@myphonydomain.com
   
   Don't use the un-constrained whitelist_from, unless as a last resort, if
   there's no other way and you cannot use the proper constrained ones,
   like whitelist_from_rcvd.
   
A local root sender should be getting ALL_TRUSTED.  whitelist_from_rcvd
won't work on local email - you need at least one external hop to get the
'rcvd' part.  You could write SpamAssassin rules to look for the messages,
but you probably don't want to AUTOLEARN the messages since any tokens in
the email are probably spam hosts.  As pointed out earlier, this type of
email should bypass SpamAssassin in procmail (etc).

   Anyway, no sample -- no way to point out your issue. Do paste at least
   the headers of such a mail.
   
Yep.

-jeff


Re: Controlling spamd logging from spamc

2009-06-04 Thread Jeff Mincy
   From: Martin Gregorie 
   Date: Tue, 02 Jun 2009 16:54:11 +0100
   
   How difficult would it be to let spamc control spamd's logging output on
   a per-message basis? 
   
   My reason for asking is this: I maintain a body of spam that I use to
   develop and regression test local rules and, during rule development,
   use spamc to pass the test messages through my only copy of spamd. This
   is useful because I can keep the test messages in a normal user on a
   different host from the one running spamd and avoid local configuration
   ambiguities. However, as part of my logwatch environment I run a perl
   program to collect the day's spam stats. I find that the stats are
   meaningless any day I develop and/or regression test rules because, of
   course, spamd is logging these as well as actual mail. I should add
   that, since my ISP introduced greylisting, the 'spam' logged during
   regression testing is at least 12 times the volume of genuine spam
   received that day, so the day's stats are meaningless and so are any
   stats generated by scanning the whole of /var/log/maillog* 
   
   It would be useful for me to be able to disable spamd logging during
   rule testing. 
   
Wouldn't it be easier to run another spamd on a different machine for
rule development and testing?  Or perhaps just running as a different
'test' user, and then ignore log messages for that user in the statistics.

   Would anybody else find this a useful feature too?

I've sometimes wanted the other way - eg get more debugging output for
a particular message.

-jeff


Re: AWL functionality messed up?

2009-05-28 Thread Jeff Mincy
   From: Linda Walsh 
   Date: Wed, 27 May 2009 17:28:35 -0700
   
   Jeff Mincy wrote:
   >From: Linda Walsh 
   >Date: Wed, 27 May 2009 12:48:43 -0700
   >
   >Bowie Bailey wrote:  >
   >At face value, this seems very counter productive.
   >
   > You still aren't understanding the wiki or the AWL scoring or what AWL
   > is trying to do.
   
Ah, but it only seems I'm daft, today...:-)
   
   >If I get spam from 1000 senders, they all end up in my
   >AWL???
   >
   > yes.   every email+ip address pair that sends you email winds up in
   > your AWL with an average score for that pair.  This is ok.
   
GRRRnot so ok in my mindset, but ... and ... errr..
   well that only makes it more confusing, in a way...since I was
   only 99% certain that I'd never gotten any HAM from hostname
   '518501.com' (thinking for a short period that AWL might be classify
   things by hosts as reliable or not, instead of, or in addition to
   by email-addr), but I'm 99.97% certain I've never gotten any HAM
   from user 'paypal.notify' (at) hostname '5185
   
It is using the relay IP address, not the hostname...
You've most likely received some other spam from this email+ip pair
that was scored as ham.  Hard to tell without seeing the original
scores.
   
   >AWL should only be added to by emails judged to be 'ham' via
   >the feed back mechanisms --, spammers shouldn't get bonuses for
   >being repeat senders...
   >
   > You are getting too attached to the 'whitelist' part of the name.
   > Pretend AWL stands for average weighting list.
   =
Aw...come on.  Isn't the world difficult enough without
   changing white to black or white to weighing?  I mean, we humans
   have enough trouble agreeing on what our symbols, "words" mean in
   relation to concepts and all without ya goin' and redefining perfectly
   good acceptable symbols to mean something else completely and still
   claim it to be some semblance of English.   No wonder most of the
   non-techno-literate humans on this world regard us techies with
   a hint of suspicion regarding the difficulty of problems.  We go around
   redefining words to suit reality and catch the heat when the rest of
   the world doesn't understand our meaning:
   
I don't think AWL is the best possible name for the functionality,
simply because it is easy to misinterpret.

   > AWL isn't whitelisting spammers.   It is pushing the score to the
   > average for that sender.   The sender can have a high average or a low
   > average.   
   ---
An average?  So it keeps the scores of all the past emails of every 
email we 
   ever got sent?  Must just store a weighted average -- otherwise
   the space (hmm...someone said something about 80MB+ auto-whitelist DB
   files?)
   
AWL tracks the total score and the number of messages.

Why not call it the Historically Based Score Normalizer or
   HBSN module?  Db file could be "historical-norms" or something.
   
Call it BOB if that will help ...
   
   > If the previous email from a particular sender was FP or FN then AWL
   > will have an incorrect average and will wind up doing or trying to do
   > the wrong thing with subsequent email for that sender.
   
Maybe it shouldn't add in the 'average' unless it exceeds
   the 'auto-learning threshold'??  I.e. something like the
   'bayes_auto_learn_threshold_nonspam' for HAM and the
   'bayes_auto_learn_threshold_spam' for SPAM.  Assuming it doesn't
   already do such a thing, it would make a little sense...so as
   not to train it on 'bad data'...
   
Perhaps.   I don't have a particularly strong opinion.

When I run "sa-learn --spam " over a message, can I
   assume (or is it the case) that telling SA, a message was 'spam'
   would assign a sufficiently large value to the 'HBSN' value for that
   sender to reduce any effect of having falsely (if it is likely to happen)
   incorrect value?
   
Nope.

Or might I at least assume that each "sa-learn" over a message
   will modify it's AWL score appropriately?
   
no.  You shouldn't assume.  sa-learn doesn't modify the AWL entry.
You can use spamassassin --add-to-blacklist.

   > You can remove addresses using spamassassin --remove-from-whitelist
   
Yes...saw that after visiting the wiki.  Is there a
   --show-whitelist-with-current-scores-and-their-weight switch as well
   (as opposed to one that only showed the addr's in the white list, or only
   showed the non-weighted scores)?
   
If I understand what you are asking for here, you can add an X-Spam-AWL
header that giv

Re: AWL functionality messed up?

2009-05-27 Thread Jeff Mincy
   From: Linda Walsh 
   Date: Wed, 27 May 2009 12:48:43 -0700
   
   Bowie Bailey wrote:
   > Linda Walsh wrote:
   >>
   >> I got a really poorly scored piece of spam -- one thing that stood out
   >> as weird was report claimed the sender was in my AWL.
   > 
   > Any sender who has sent mail to you previously will be in your AWL.  
   > This is probably the most misunderstood component of SA.  Read the wiki.
   > 
   > http://wiki.apache.org/spamassassin/AutoWhitelist
   
   
   At face value, this seems very counter productive.
   
You still aren't understanding the wiki or the AWL scoring or what AWL
is trying to do.

   If I get spam from 1000 senders, they all end up in my
   AWL???
   
yes.   every email+ip address pair that sends you email winds up in
your AWL with an average score for that pair.  This is ok.

   WTF?
   
   AWL should only be added to by emails judged to be 'ham' via
   the feed back mechanisms --, spammers shouldn't get bonuses for
   being repeat senders...
   
You are getting too attached to the 'whitelist' part of the name.
Pretend AWL stands for average weighting list.

   How do I delete spammer addresses from my 'auto-white-list'?
   
   (That's just insane..whitelisting spammers?!?!)

AWL isn't whitelisting spammers.   It is pushing the score to the
average for that sender.   The sender can have a high average or a low
average.   

If the previous email from a particular sender was FP or FN then AWL
will have an incorrect average and will wind up doing or trying to do
the wrong thing with subsequent email for that sender.

You can remove addresses using spamassassin --remove-from-whitelist

-jeff


Re: spamassassin runs razor spamc not

2009-05-22 Thread Jeff Mincy
   From: Mester 
   Date: Fri, 22 May 2009 14:52:08 +0200
   
   >>> Check in the ~/.spamassassin/user_prefs file for the user that runs
   >>> amavisd-new.  I know the Mandriva package has that set to 'use_razor2
   >>> 0', so I always have to hunt it down and fix it.
   >> I had no use_razor2 line in the ~amavis/.spamassassin/user_prefs file
   >> but after appending these lines to the file:
   >> use_razor2
   >> razor_config /var/lib/amavis/.razor/razor-agent.conf
   >> and restarting both amavis and spamassassin nothig has changed.
   > 
   > Then, you need to run some of the amavisd-new debugs
   > 
   > I believe the syntax is
   > 
   > [amav...@foo]$ /usr/sbin/amavisd debug-sa plugin
   
   It worked. And now I found the error: amavis user couldn't read the 
   /var/log/razor-agent.log file. I modified the owner of that file to 
   amavis and now I see the check lines in that file.
   
   Is there a way to instruct spamassassin to write the razor, pyzor and 
   dcc check's result to every e-mail's header an not only for spams?

SpamAssassin has add_header that can be used for Pyzor and DCC.

  add_header all Pyzor _PYZOR_
  add_header all DCC _DCCB_; _DCCR_

I don't know how headers are added in amavis.
-jeff


Re: learning from IMAP spam collection

2009-05-19 Thread Jeff Mincy
   From: Michael Monnerie 
   Date: Tue, 19 May 2009 09:34:53 +0200
   
   On Sonntag 17 Mai 2009 Michael Monnerie wrote:
   > Why is it so extremely
   > slow and CPU consuming just to remove any existing markups?
   
   There really seems to be no other way than calling "spamassassin -d" to 
   remove existing markups. I guess I will create an account where a script 
   takes all messages from folder X, removes markup, and stores to Y. Like 
   this, I don't mind too much how long it takes. It's still a PITA that 
   there's no quick "spamc" like way to remove markups.
   
You can use formail to remove headers.  It is way faster than spamassassin -d.
The only trick is listing all of the headers that can be added by
SpamAssassin.

formail -b -t -I X-Spam-Status: -I X-Spam-Flag: -I X-Spam-Checker-Version: -I 
X-Spam-Rbl: -I X-Spam-Pyzor: -I X-Spam-DCC: -I X-Spam-Level: -I X-Spam-Bayes: 
-I X-Spam-Relay: -I X-Spam-Report: -I X-Spam-AWL: -I X-Spam-Karma: -I 
X-Spam-ASN: -I X-Spam-CRM114: -I X-Spam-Relay-Country: <  msg

-jeff


Re: whitelist_from_spf

2009-05-14 Thread Jeff Mincy
   From: Alvaro Marín 
   Date: Thu, 14 May 2009 13:30:49 +0200

   It seems that there is a problem resolving DNS records of that domain so I
   want to whitelist it. If I add:
   
   whitelist_from_spf *...@orange.es
   
   It's ignored by SA, as the log says.
   Reviewing code of SPF.pm from SpamAssassin, I see:
   
 # if the message doesn't pass SPF validation, it can't pass an SPF
 ...
   
   So, which is the purpose of this whitelist feature? If the SPF check fails,
   it can't do whitelist?
   
Yes.  The whitelist check is done after the SPF check.  Anybody can
have a SPF record.  SPF just means that the message is genuine = not
forged.  You can get genuine spam.  If you aren't getting SPF_PASS on
the message then whitelist_from_spf won't do anything.

If you are getting SPF_PASS on email from other domains then the
domain you are trying to whitelist probably does not have spf setup.

-jeff


Re: Properly integrating clamAV into SpamAssassin

2009-05-04 Thread Jeff Mincy
   From: Adam Katz 
   Date: Sun, 03 May 2009 18:47:21 -0400

   I am under the impression that virus checking is *not* that much easier
   than a fully-loaded SA implementation, so therefore spam detection
   should run first.  Counter-point:  online lookups cost bandwidth and
   latency, virus detection doesn't (yet) require any.

Have you timed ClamAV?  It is essentially free.  On my machine I
get >100 ClamAV virus scans per second, which is *way* faster than
SpamAssassin.

   Pause.  Constructive comments and criticisms?

I disagree with your premise...

Time ClamAV and your fully-loaded SA implementation on a set of
messages.   You can time SpamAssassin with and without network tests
for a more complete picture.
   
   Don't get too caught up in the above part, it is all illustrative in
   getting to my question below.
   
   Mail that passes SpamAssassin but gets caught by ClamAV would add value
   to SA's Bayesian and AWL databases and thus the message stands a chance
   at getting caught in the future regardless of its viral content.
   
Feeding virus email into SpamAssassin Bayes seems like a bad idea to
me.  The bayes tokens aren't going to be all that useful for catching
non virus spam.

Adding the virus email into AWL seems somewhat reasonable since any
further email from the same IP address is likely to be another virus
or botnet spam.  However, in practice any botnet spam will use
different random email addresses so you probably won't get any awl
hits on the AWL addresses learned from virus email.

-jeff


Re: Almost no score

2009-05-01 Thread Jeff Mincy
   From: Charles Gregory 
   Date: Fri, 1 May 2009 10:48:00 -0400 (EDT)
   
   Uh, what do these 'ratware' rules trigger on? 

The rules trigger on spam with a particular Message-Id and boundary pattern.

   How effective are they, and what are the chances of false positives?

For last month the KB_RATWARE_OUTLOOK_08 rule hits 
21% of spam (4665 hits out of 21748 spam).   It works great here.
I haven't seen any FP.  Your mileage may vary.

I got the rules from Karsten's sandbox:
http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/kb/70_misc.cf

I would imagine that these rules will eventually show up in sa-update.
-jeff

   
   On Thu, 30 Apr 2009, LuKreme wrote:
   > (single lines)
   > header  KB_RATWARE_OUTLOOK_16  ALL =~ /^Message-Id: 
   > 
<([0-9a-f]{8})\$([0-9a-f]{8})\$.{100,400}boundary="=_NextPart_000__\1\.\2/msi
 
   > # "
   >
   > header  KB_RATWARE_OUTLOOK_12  ALL =~ /^Message-Id: 
   > 
<([0-9a-f]{8})\$([0-9a-f]{4})[0-9a-f]{4}\$.{100,400}boundary="=_NextPart_000__\1\.\2/msi
 
   > # "
   >
   > header  KB_RATWARE_BOUNDARYALL =~ /^Message-Id: 
   > 
<([0-9a-f]{8})\$[0-9a-f]{8}\$.{100,400}boundary="=_NextPart_000__\1\./msi
 
   > # "
   >
   > score KB_RATWARE_BOUNDARY 2.0
   > score KB_RATWARE_OUTLOOK_16 0.1
   >
   >
   > -- 
   > Exit, pursued by a bear.
   >


Re: 'anti' AWL

2009-04-29 Thread Jeff Mincy
   From: Charles Gregory 
   Date: Wed, 29 Apr 2009 14:31:22 -0400 (EDT)
   
   
   I just turned off my AWL today, because of FP issues but
   
   > f...@example.com sends me lots of mail.  Say it's over 100.  It's all ham 
and 
   > it all comes from mail.example.com. The AWL for this email couplet is , 
say 
   > -2.1.  An email comes in from f...@example.com but sent from 
spam.spammer.tld 
   > and score 7.0.  It gets an additional, say, .42 (20% of the AWL) to score 
   > 7.42 instead. Now, another mail from f...@example.com comes in from 
   > mail.spam2.tld, this one scores 4.3. It gets a +.42 for missing the match 
on 
   > mail.example com, and gets a +.288 for missing the match on 
spam.spammer.tld
   
   This sounds like an attempt to mimic the effects of SPF records by noting 
   which servers send "most" of the mail for a given address. Sadly, this 
   logic breaks down when the spammers 'get there first' and/or send a 
   greater volume of mail than the genuine sender. Admittedly the latter 
   situation is a low probability for any single sender, but in the big 
   picture, *someone* is getting their AWL reputation trashed every time a 
   spammer forges their e-mail.

AWL stores the IP/16 address with the email address.   So your awl
reputation is not being trashed by forged e-mail that comes from a
different IP address.
   
   Just this Monday I had a phishing attack againstmy clients, with *dozens* 
   of e-mails, all purporting to come from ME that came from the *same* 
   server! In this case, as I only send a half dozen messages per month from 
   that account, the spammer would get the favored rating?

Only if the spammer uses the same server that you do.
-jeff


Re: 'anti' AWL

2009-04-28 Thread Jeff Mincy
   From: LuKreme 
   Date: Tue, 28 Apr 2009 08:43:46 -0600
   
   OK, working on my first cup of coffee this morning, so maybe this has  
   potential.
   
   The way the AWL works is by keeping track of the origin of emails,  
   both the address and the server (the top line Received header?) that  
   send the email.  So, lets say that I have a lot of email from 
f...@example.com 
 and that foo's email is sent to me via mail.example.com.
   
   Now, I get an email claiming to be from f...@example.com but sent to me  
   from suspiciousserver.tld, so the AWL is not applied.
   
Your idea will FP anytime anybody adds a new email device or the ISP
changes (etc).

You could use the sagrey plugin to add a point to email from a new
email address+ip pairs.

-jeff


Re: AWL and FP's....

2009-04-22 Thread Jeff Mincy
   From: Charles Gregory 
   Date: Wed, 22 Apr 2009 15:56:53 -0400 (EDT)
   
   Just curious if anyone has ever found a 'clean' way to handle the 'damage' 
   done to the AWL when someone's mail is blocked by a false positive, and 
   the sender is stupid enough to keep retrying the offending mail?

Meaning that the first message from the sender was incorrectly marked
as spam and AWL then made sure that all subsequent messages from the
same sender were also marked as spam?
   
The easiest way to fix it is to smash the AWL entry with spamassassin
--add-to-whitelist or remove the AWL entry using --remove-from-whitelist.

   I would rather not turn off AWL. I like the way it gives a negative score 
   bias to frequent correspondents. But is there a (sub)setting to allow me 
   to permit the negative bias, but *not* allow it to add a positive one?
   
Nope - the only thing you can do is set the factor which acts on both
positive and negative scores.

   And while I'm at it, can anyone verify whether 'constantcontact' is really 
   a legit mail service or a spam haven? That's the FP that caused this 
   issue
   
they do email for various organizations.

-jeff


Re: use_auto_whitelist error in lint

2009-04-09 Thread Jeff Mincy
   From: realshock 
   Date: Thu, 9 Apr 2009 06:56:05 -0700 (PDT)
   
   Matt Kettler-3 wrote:
   > Find out where else you've got "use_auto_whitelist 0" in your config,
   > and remove it. 
   > On the plus side, it does confirm you've correctly disabled the plugin.
   
   I searched all over the place, and following your directions, do you think
   this command will find where it is?
   # grep -iR use_auto_whitelist /*

spamassassin -D --lint prints out the config files, eg:
  spamassassin -D --lint 2>&1 | fgrep 'config: read file'

The use_auto_whitelist is in one of those config files.
-jeff


Re: need help - procmail & spamassassin

2009-04-04 Thread Jeff Mincy
   From: "sebast...@debianfan.de" 
   Date: Sun, 05 Apr 2009 01:56:38 +0200
   
   Hello,
   
   i am filtering mails with spamassassin & procmail.
   
This is more of a procmail question, so it doesn't actually belong here.

   The header of message
   
   X-Spam-Level: **
   
   I want to sort mails into some different directories.
   
   10 or more --> directory 10
   9 --> directory 9
   
   and so one
   
Do you really want that many different mail folders?   Wouldn't low>=5,
mid>=10 and high>=15 be sufficient?

   But - nothing happens - the mails are all in the /Maildir/new directory
   why ?
   
The .*\( part.

   :0:
   * ^X-Spam-Level: .*\(\*\*\*\*\*\*\*\*\*\*
   Maildir/10/new

You don't need the .* and you don't want the \(

* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*

Also, You can use the numeric score directly.

For example, you can set X_SPAM_SCORE in a procmail recipe the be the
number following score= on the X-Spam-Status line.

 X_IS_SPAM="Unknown"
 X_SPAM_SCORE=""
 :0
 * ^X-Spam-Status: \/.*
 {
   :0
   * ^X-Spam-Status: \/(Yes|No|YES|NO|Skipped)
   { X_IS_SPAM="$MATCH" }

   :0
   * ^X-Spam-Status: (Yes|No|YES|NO)[, ]+(hits|score)=\/([-0-9.]+)
   { X_SPAM_SCORE="$MATCH" }
 }

Then you can do recipes like this that matches spam scoring 12.5 or higher.

 SPAM_CUTOFF=12.499
 :0
 * X_IS_SPAM ?? (Yes|YES)
 *$ -$SPAM_CUTOFF ^0
 *$  $X_SPAM_SCORE ^0
 somefolder
   
   :0:
   * ^X-Spam-Level: .*\(\*\*\*\*\*\*\*\*\*\*
   Maildir/10/new
   
   :0:
   * ^X-Spam-Level: .*\(\*\*\*\*\*\*\*\*\*
   X-Spam-Level: ***
   Maildir/9/new

You don't want the extra 'X-Spam-Level: ***' line here.

-jeff


Re: New kind of spam

2009-03-31 Thread Jeff Mincy
   From: Arvid Ephraim Picciani 
   Date: Tue, 31 Mar 2009 12:33:49 +0200
   
   > What do you mean "its impossible to train bayes"?
   
   i was assuming the random text at the end is what couses my bayes db to 
   behave randomly.
   
Random text that occurs only in spam rapidly becomes a spam sign.  Random
spam text that also occurs in ham requires a period of adjustment for
Bayes, but eventually Bayes figures it out.

   > Bayes really can be trained to deal with this message.
   > For example, I get BAYES_95:
   
   well i get 00
   
An occasional spam getting a low bayes score is ok, but lots
of spam getting BAYES_00 is a problem.

Train Bayes with more spam messages and correct any incorrectly learned
messages.

   > After I learn this message the probability increases to BAYES_99
   
   yes, for that specific message.  what exactly is the point of learning 
   specific messages when the next one will be different anyway.

Perhaps you are missing the point of bayes.  I got bayes_95 on the
message before training on the message.  My SpamAssassin hadn't seen
the message before, but it had trained on similar spams.
Bayes breaks the message up into various tokens, some of tokens from
this or any spam message will be repeated in other spam messages.

   >   % wget -O - -q http://codepad.org/W53onqK9/raw.txt | spamc | /bin/fgrep 
--text X-Spam-Bayes
   >   X-Spam-Bayes: bayes=1., N=50(47-2+29), ham=(sort, doing), 
spam=(UD:spaces.live.com, UD:live.com, UD:entry, dawn, 
HX-Mozilla-Status2:)
   
   interestingly i dont have that header.
   i'll check docs.

The X-Spam-Bayes header was added with
  add_header all Bayes bayes=_BAYES_, 
N=_BAYESTC_(_BAYESTCLEARNED_-_BAYESTCHAMMY_+_BAYESTCSPAMMY_), 
ham=(_HAMMYTOKENS(5,short)_), spam=(_SPAMMYTOKENS(5,short)_)

-jeff


Re: New kind of spam

2009-03-30 Thread Jeff Mincy
   From: Arvid Ephraim Picciani 
   Date: Wed, 25 Mar 2009 16:59:58 +0100
   
   http://codepad.org/W53onqK9
   
   i gave on this kind of spam.  its impossible to train bayes and changing 
   to fast to make custom rules. ...
   
What do you mean "its impossible to train bayes"?
Bayes really can be trained to deal with this message.
For example, I get BAYES_95:

  wget -O - -q http://codepad.org/W53onqK9/raw.txt | spamc | /bin/fgrep --text 
X-Spam-Bayes
  X-Spam-Bayes: bayes=0.9679, N=50(29-2+11), ham=(sort, doing), 
spam=(UD:spaces.live.com, UD:live.com, UD:entry, dawn, 
HX-Mozilla-Status2:)

After I learn this message the probability increases to BAYES_99

  % wget -O - -q http://codepad.org/W53onqK9/raw.txt | sa-learn --spam
  Learned tokens from 1 message(s) (1 message(s) examined)
  % sa-learn --sync
  % wget -O - -q http://codepad.org/W53onqK9/raw.txt | spamc | /bin/fgrep 
--text X-Spam-Bayes
  X-Spam-Bayes: bayes=1., N=50(47-2+29), ham=(sort, doing), 
spam=(UD:spaces.live.com, UD:live.com, UD:entry, dawn, 
HX-Mozilla-Status2:)

Note that Bayes has determined that UD:spaces.live.com is a spam sign.

The X-Spam-Bayes header is added with
  add_header all Bayes bayes=_BAYES_, 
N=_BAYESTC_(_BAYESTCLEARNED_-_BAYESTCHAMMY_+_BAYESTCSPAMMY_), 
ham=(_HAMMYTOKENS(5,short)_), spam=(_SPAMMYTOKENS(5,short)_)

-jeff


Re: Blacklisting Cyrillic

2009-03-26 Thread Jeff Mincy
   From: Kenneth Porter 
   Date: Thu, 26 Mar 2009 17:22:21 -0700
   
   I'd like to score anything in Windows-1251 fairly high, as I don't expect 
   to get anything legitimate in that charset. How can I read the charset 
   declared in a Subject header, or in a MIME part, for matching in a rule?
   
   The only tools I see are ok_locales and CHARSET_FARAWAY, but those seem 
   like heavy hammers as they blacklist everything and then require me to 
   whitelist what I want. I'd rather the reverse: let me list which codepages 
   to reject.
   
   I tried this rule but it's not firing and I'm not sure why:
   
   describe KP_CYRILLIC Cyrillic code page
   header   KP_CYRILLIC Subject =~ /Windows-1251/
   scoreKP_CYRILLIC 0.1
   
Try Subject:raw to inhibit decoding?

-jeff


RE: Server overload, queuing for SA possible?

2009-03-26 Thread Jeff Mincy
   From: Bowie Bailey 
   Date: Thu, 26 Mar 2009 12:07:23 -0500
   
   Jeff Mincy wrote:
   > 
   >If I'm reading the spamc man page correctly, it will wait 5
   >minutes for spamd to process the message, but it will only wait
   >about 3 seconds for a connection to spamd (3 tries with 1 second
   >sleep between them). That's not much of a queue.  Or am I missing
   >something? 
   > 
   > The --connect-retries=retries and --retry-sleep=sleep options control
   > connection attempts.   The connection attempt was successful, you are
   > just waiting for spamd to get around to the message.   If spamd
   > refuses the connection then spamc will retry a few times.
   
   Ok, so spamd will accept the connection and hold onto it until a child
   process is available.  How many connections can spamd queue?

I dunno.  As I recall, on linux the maximum number of connections is
controlled by some kernel limit, probably 4000.  You'll run out of
something else before you get anywhere near this number.  Of course,
messages will start timing out in spamc if they are not processed fast
enough.

-jeff


RE: Server overload, queuing for SA possible?

2009-03-26 Thread Jeff Mincy
   From: Bowie Bailey 
   Date: Thu, 26 Mar 2009 09:55:45 -0500
   
   Jeff Mincy wrote:
   >From: Bowie Bailey 
   >Date: Thu, 26 Mar 2009 08:48:30 -0500
   > 
   >Brian J. Murrell wrote:
   >> On Wed, 2009-03-25 at 15:01 -0400, Michael Scheidell wrote:
   >> >
   >> > Match your MTA processes to the spamd children.  Your MTA will
   >send > > 4xx 'busy now, come back to play later' message.  Let the
   >sending > > MTA queue it back up (or zombies will just go away)
   >>
   >> I don't really see that as a socially responsible action.  If my
   >> mailserver was completely loaded to the point of not even being
   >able > to queue a message, I'd buy pushing back on the sender with
   >a 4xx, > but the reality is that while I may have maxed out my
   >spamd children, > I can likely still receive and queue mail
   >locally. >
   >> The queueing up of mail to spamd really belongs on the local
   >server, > and should not become a burden on sending MTAs.
   > 
   >This really depends on where you are running SA in the delivery
   >process. > I'm kinda gathering that this is not possible within
   >spamassassin > itself.  Probably in fact it is for at least some
   >MTAs but how to > achieve it becomes MTA specific and OT here.
   > 
   >SA is not capable of any sort of queuing.  If you need that, you
   >will have to make your MTA do it one way or another.
   > 
   > The spamassassin executable doesn't queue - it just starts up a new
   > process each time it scans a message.
   > 
   > However, spamd queues connections when all of the children are busy
   > processing messages.
   > 
   > From the spamd man page:
   > 
   >-m number , --max-children=number
   >This option specifies the maximum number of children to
   >spawn. Spamd will spawn that number of children, then
   >sleep in the background until a child dies, wherein it
   >will go and spawn a new child.
   > 
   >Incoming connections can still occur if all of the
   >children are busy, however those connections will be
   >queued waiting for a free child.  The minimum value is 1,
   > the default value is 5. 
   > 
   > As long as messages are processed reasonably quickly everything will
   > be fine.  If spamd takes too long to process messages then the MTA
   > will start timing out (like 2-10 minutes).  What happens then is up to
   > the MTA.
   > -jeff
   
   Ok, it does queue connections, but that is very limited.  This thread is
   specifically talking about what happens when spamd is taking too long.
   
Yes.   We were getting away from that issue.

The machine may not have enough resources to run the number of spamd
children.  A caching name server helps with throughput.   Some more
details about the machine could be useful as well as details on what
else is happening on the machine when the spamd queue backs up.

   If I'm reading the spamc man page correctly, it will wait 5 minutes for
   spamd to process the message, but it will only wait about 3 seconds for
   a connection to spamd (3 tries with 1 second sleep between them).
   That's not much of a queue.  Or am I missing something?

The --connect-retries=retries and --retry-sleep=sleep options control
connection attempts.   The connection attempt was successful, you are
just waiting for spamd to get around to the message.   If spamd
refuses the connection then spamc will retry a few times.

-jeff


RE: Server overload, queuing for SA possible?

2009-03-26 Thread Jeff Mincy
   From: Bowie Bailey 
   Date: Thu, 26 Mar 2009 08:48:30 -0500
   
   Brian J. Murrell wrote:
   > On Wed, 2009-03-25 at 15:01 -0400, Michael Scheidell wrote:
   > > 
   > > Match your MTA processes to the spamd children.  Your MTA will send
   > > 4xx 'busy now, come back to play later' message.  Let the sending
   > > MTA queue it back up (or zombies will just go away)
   > 
   > I don't really see that as a socially responsible action.  If my
   > mailserver was completely loaded to the point of not even being able
   > to queue a message, I'd buy pushing back on the sender with a 4xx,
   > but the reality is that while I may have maxed out my spamd children,
   > I can likely still receive and queue mail locally.
   > 
   > The queueing up of mail to spamd really belongs on the local server,
   > and should not become a burden on sending MTAs.
   
   This really depends on where you are running SA in the delivery process.
   > I'm kinda gathering that this is not possible within spamassassin
   > itself.  Probably in fact it is for at least some MTAs but how to
   > achieve it becomes MTA specific and OT here.
   
   SA is not capable of any sort of queuing.  If you need that, you will
   have to make your MTA do it one way or another.

The spamassassin executable doesn't queue - it just starts up a new
process each time it scans a message.

However, spamd queues connections when all of the children are busy
processing messages.

>From the spamd man page:

   -m number , --max-children=number
   This option specifies the maximum number of children to spawn.
   Spamd will spawn that number of children, then sleep in the
   background until a child dies, wherein it will go and spawn a new
   child.

   Incoming connections can still occur if all of the children are
   busy, however those connections will be queued waiting for a free
   child.  The minimum value is 1, the default value is 5.

As long as messages are processed reasonably quickly everything will
be fine.  If spamd takes too long to process messages then the MTA
will start timing out (like 2-10 minutes).  What happens then is up to
the MTA.
-jeff


Re: Spam Assassin White List

2009-03-24 Thread Jeff Mincy
   From: Matus UHLAR - fantomas 
   Date: Tue, 24 Mar 2009 15:30:23 +0100
   
   On 23.03.09 21:58, dsh979 wrote:
   > I did not realise that items listed on the white list or the black list
   > would still be subject to the operation/analysis of the SpamAssassin 
Rules.  
   
   all rules are processed unless you play with ShortCircuit plugin. Beware of
   that: It may render the SA useless if you don't knwo what you are doing.
   
   > You have asked why I have set the required score the 100.  Lengthy
   > explanation (sorry).  I have done this to prevent SpamAssassin from
   > inserting SpamWarnings into the header/body of the relevant email.
   
   There's report_safe option to configure that.
   
Also rewrite_header 
   
   > Q:How can I list items/users on a "white list" or a "black list" without 
the
   > lists (and items) being the subject of further analysis by the SpamAssassin
   > Rules (and therefore obtaining the same score for each item on the relevant
   > list, irrespective of the operation of the SpamAssassin Rules, that is
   > -100=white list items & +100 = black list items)?
   
   I somehow do not understand this question.

He wants the white/black lists to run first and then short circuit.
So anybody in the whitelist gets a score of -100 and anybody in the
blacklist gets a score of +100.  This can probably be done with the
ShortCircuit plugin and setting the priority of the rules so that they
run first.

Black lists aren't all that useful for stopping spam.   The email
addresses are forged in spam.

-jeff


Re: negative scores for spam

2009-03-23 Thread Jeff Mincy
   From: Chris Barnes 
   Date: Mon, 23 Mar 2009 11:14:37 -0500
   
   Jeff Mincy wrote:
   
   > Yow.  The negative scoring bayes rules are extremely reliable when well
   > trained.  Ham messages are not trying to evade the filter.  Defeating
   > bayes with poison is mostly a myth.  The random garbage might work the
   > first time but not the second time as long as you are training these
   > messages as spam.  If you are getting lots of BAYES_00 hits on spam
   > then the problem is almost certainly incorrect training where spam
   > messages were incorrectly learned as ham.
   
   Fair enough.
   
   But the problem remains.  A simple glance at this list shows that this 
   happens often enough to be a fairly common problem.
   
   The question is:  How does one fix the problem after it occurs?

The way to fix the problem is to relearn any incorrectly learned
messages.  So any spam message that was incorrectly learned as ham,
either automatically or manually, needs to be correctly relearned as
spam using sa-learn.  You should also learn as spam any spam messages
that hits BAYES_00, or anything less than BAYES_50.  You should also
do the same thing for HAM messages hitting BAYES_50 - BAYES_99.

The more messages that you correctly train the more accurate and
definitive bayes will be.

If you don't have the incorrectly learned messages to retrain then you
can always start over by removing the bayes database files in your
.spamassassin directory.

-jeff


Re: negative scores for spam

2009-03-20 Thread Jeff Mincy
   From: Jesse Stroik 
   Date: Fri, 20 Mar 2009 16:14:39 -0500
   
   Hoover Chan wrote:
   > The threshold was set to 6.6 (cf. required=6.6). The message this was 
attached to was very definitely junk. This kind of situation got me curious 
about the whole thing where any positive spam score is set as the threshold but 
seeing junk mail coming in with negative scores.
   
   You are getting negative scores for auto white list and for bayes_00. 
   It's a matter of taste and what you believe makes sense, but I don't 
   consider bayes to be all that accurate (since there are methods for 
   defeating bayes, poisoning bayes, etc).  As such, I don't allow Bayes to 
   assign negative scores or positive scores within a couple of points of 
   the threshold.  You can do so by assigning scores like this:
   
   score BAYES_00  0
   score BAYES_05  0
   score BAYES_20  0
   score BAYES_40  0
   
Yow.  The negative scoring bayes rules are extremely reliable when well
trained.  Ham messages are not trying to evade the filter.  Defeating
bayes with poison is mostly a myth.  The random garbage might work the
first time but not the second time as long as you are training these
messages as spam.  If you are getting lots of BAYES_00 hits on spam
then the problem is almost certainly incorrect training where spam
messages were incorrectly learned as ham.

   I also disable AWL since a lot of spam, especially the stuff most likely 
   to be tested against spamassassin, will like use known good email 
   addresses from your domain as the "from" address.  This is fairly likely 
   to hit on the AWL.

Yow again.   AWL uses email address and the IP address.  So forged
email addresses used in spam is not going to use the same EMAIL+IP
pair as legitimate email using the same email address.
   
   Again, it's just a matter of taste and it all depends on how you've set 
   up your scoring.  I'm pretty cautious to ensure there aren't false 
   positives as that would decrease the value of spamassassin greatly for 
   us, but I otherwise avoid AWL and Bayes negative scores.
   
   If you sent us a copy of the spam, we could test it and show you what 
   should be hitting.

Use pastebin instead.

-jeff


Re: negative scores for spam

2009-03-20 Thread Jeff Mincy
   From: Hoover Chan 
   Date: Fri, 20 Mar 2009 13:55:08 -0700 (PDT)
   
   The threshold was set to 6.6 (cf. required=6.6). The message this
   was attached to was very definitely junk. This kind of situation got
   me curious about the whole thing where any positive spam score is
   set as the threshold but seeing junk mail coming in with negative
   scores.
   
Train BAYES.  The message hit BAYES_00.  You want BAYES_99.  So either
you have incorrectly learned similar messages or you haven't trained
enough.
-jeff
   
   
   -- 
   Hoover Chan c...@sacredsf.org 
   Technology Director 
   Schools of the Sacred Heart 
    Broadway St. 
   San Francisco, CA 94115
   
   
   - "Rick Macdougall"  wrote:
   
   > Hoover Chan wrote:
   > > Can someone point me to what I can do to my Spam Assassin config for
   > a situation like the following?
   > > 
   > > X-Spam-Status: No, score=-1.496 tagged_above=-10 required=6.6
   > >  tests=[AWL=-1.103, BAYES_00=-2.599, HTML_MESSAGE=0.001,
   > >  URIBL_BLACK=1.955, URIBL_GREY=0.25]
   > > 
   > > That is, a positive score criterion with a spam message that comes
   > out with a negative number.
   > > 
   > 
   > Errr
   > 
   > -1.103 - 2.599 + 0.001 + 1.955 + 0.25 = -1.49600
   > 
   > Where do you see that it should be positive ?
   > 
   > Regards,
   > 
   > Rick


Re: SpamAssassins bayes mechanism and message headers

2009-03-18 Thread Jeff Mincy
   From: Matt Kettler 
   Date: Wed, 18 Mar 2009 19:49:53 -0400
   
   Jeff Mincy wrote:
   >From: Matt Kettler 
   >Date: Tue, 17 Mar 2009 21:30:02 -0400
   >
   >fl...@pbartels.info wrote:
   >> Hello,
   >>
   >> instead of disabling a lot possibly set message headers using
   >> "bayes_ignore_header" and ending up in strange configs like:
   >>
   >> bayes_ignore_header Return-Path
   >...
   >> (found on the net)
   >Where?
   >>
   >> shouldn't SpamAssassins bayes mechanism just ignore the complete
   >> message header and just look at the body?
   >> This seems useful in my opinion.
   >It seems like a very misguided idea to me.
   >
   >Is there any reason to think headers make bad tokens?
   >Do you have any test data showing this improves your bayes accuracy?
   >
   > Yes - I think some headers make extremely bad tokens for bayes, for
   > example the X-Mailer/User-Agent headers.   40% of the spam I get
   > claims to  have Microsoft Outlook as a x-Mailer.   So bayes rapidly
   > determines that *UAMicrosoft (etc) is an extremely strong token.
   > These *UA tokens were enough to push a short ham message to BAYES_99.
   > When I added an bayes_ignore_header the score dropped to ~BAYES_40
   >   
   That seems rather extraordinarily strange. Did the messages match no
   other tokens at all?  (ie: did you run it through spamaassassin -D bayes
   before and after?)
   
This was the X-Spam-Bayes header that was added at the time:
   X-Spam-Bayes: bayes=1., N=27(19-0+13), ham=(), spam=(HTo:U*mincy, 
HTo:D*com, HTo:D*rcn.com, H*F:D*net, H*UA:Build)

This header was added using:
   add_header all Bayes bayes=_BAYES_, 
N=_BAYESTC_(_BAYESTCLEARNED_-_BAYESTCHAMMY_+_BAYESTCSPAMMY_), 
ham=(_HAMMYTOKENS(5,short)_), spam=(_SPAMMYTOKENS(5,short)_)


So, there are 27 tokens, 0 hammy, 13 spammy.

   I'd be very interested in what's going on there, because it makes very
   little sense unless the message really matched very, very little other
   existing training.
   
3 of the top 5 spammy tokens eg: HTo:U*mincy, HTo:D*com, HTo:D*rcn.com
come from the To: mi...@rcn.com header.  The  H*UA:Build came from a
  'X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0)'
header.  As I recall, there were various H*UA:Outlook etc headers.

Bayes was 100.000% sure that this message was spam based on the To,
X-Mailer, and From headers.  The envelope on all email message that I
read at home are addressed to mi...@rcn.com (ignoring for the moment
that mi...@starpower.net also happens to get to me).  The 'To:' header
is either going to be mi...@rcn.com or some made up email address that
will never be repeated or it is my email address. So Bayes will see my
email address in both spam and ham.  At the time more than 80% of
email I was getting at rcn.com was spam so, To: mi...@rcn.com was
turned into three strong spam tokens.  My real mi...@rcn.com email
address in the To header says nothing about the spamminess of the
message.  This is in contrast to the mi...@starpower.net email address
which is almost certainly spam and has been added to the
blacklist_to).  So my solution was to add 'bayes_ignore_header To
From' and use blacklist_to/blacklist_from for the suspect email
addresses.  I came up with similar justification for adding
'bayes_ignore_header X-Mailer'.

The body of the message was a single sentence asking me about my
primary music software.

If you want to see more detail lets take it off the public mailing
list.

-jeff


Re: SpamAssassins bayes mechanism and message headers

2009-03-18 Thread Jeff Mincy
   From: Greg Troxel 
   Date: Wed, 18 Mar 2009 15:33:31 -0400
   
   Jeff Mincy  writes:
   
   >From: Matt Kettler 
   >Date: Tue, 17 Mar 2009 21:30:02 -0400
   >
   >> shouldn't SpamAssassins bayes mechanism just ignore the complete
   >> message header and just look at the body?
   >> This seems useful in my opinion.
   >It seems like a very misguided idea to me.
   >
   >Is there any reason to think headers make bad tokens?
   >Do you have any test data showing this improves your bayes accuracy?
   >
   > Yes - I think some headers make extremely bad tokens for bayes, for
   > example the X-Mailer/User-Agent headers.   40% of the spam I get
   
   I think I'm having a similar problem, where I get spam via a
   mailinglist, and bayes gives the spam credit for having similar headers
   to the ham which arrives on the list.  I'm not so concerned about
   including the headers as they arrive at the list server, but all the
   headers added from receipt by the list server seem inappropriate.
   
   I'll try bayes_ignore_header.

Scanning mailing list email is more trouble that it's worth.  It can
be done, but you have to be very motivated and it is a lot of work to
maybe catch a few mailing list spam messages.

Bayes needs to ignore any headers and any special footer tokens added by
the mailing list postings.  You need to extend trusted_networks to the
mailing list so that various tests are done on the submitter instead of
the mailing list.  DCC should be whitelisted for most mailing lists
since the email messages are bulk.  Any automatic reporting needs to be
turned off.  I'm sure there are other things that I'm forgetting.

If the mailing list has reasonably good spam filtering then just skip
running SpamAssassin.

-jeff


Re: SpamAssassins bayes mechanism and message headers

2009-03-18 Thread Jeff Mincy
   From: Matt Kettler 
   Date: Tue, 17 Mar 2009 21:30:02 -0400
   
   fl...@pbartels.info wrote:
   > Hello,
   >
   > instead of disabling a lot possibly set message headers using
   > "bayes_ignore_header" and ending up in strange configs like:
   >
   > bayes_ignore_header Return-Path
   ...
   > (found on the net)
   Where?
   >
   > shouldn't SpamAssassins bayes mechanism just ignore the complete
   > message header and just look at the body?
   > This seems useful in my opinion.
   It seems like a very misguided idea to me.
   
   Is there any reason to think headers make bad tokens?
   Do you have any test data showing this improves your bayes accuracy?

Yes - I think some headers make extremely bad tokens for bayes, for
example the X-Mailer/User-Agent headers.   40% of the spam I get
claims to  have Microsoft Outlook as a x-Mailer.   So bayes rapidly
determines that *UAMicrosoft (etc) is an extremely strong token.
These *UA tokens were enough to push a short ham message to BAYES_99.
When I added an bayes_ignore_header the score dropped to ~BAYES_40
Obfuscated words like 'st0ck' are 100% indications of spam (or of
messages that discuss spam), so these words work great for bayes.
A 'X-Mailer: Microsoft Office Outlook' header doesn't really tell you
anything about the message, at least not to the extent that bayes
treats these tokens.

The Message-ID tokens are also low quality tokens.  Most of these
tokens are hapaxes that are never used by other messages.  These just
fill up the bayes database.  Maybe if the Message-ID tokens were even
more processed then maybe these could be more useful for bayes - eg -
replace 1234.56789 with a format %4d.%5d, or throw out all of the
timestamp numbers and keep the just the stuff after the @.
-jeff


Re: Some emails pass spamassassin unprocessed

2009-02-20 Thread Jeff Mincy
   From: Monky 
   Date: Fri, 20 Feb 2009 03:31:14 -0800 (PST)
   
   Hello,
   I am running the Spamd Daemon version 3.2.5 on my Linux web and mail server
   and in general it works well. From time to time (somewhere in between 1-10%
   of all emails) spam passes the filter - but not because spamassassin decides
   that it is ham but because the email never gets processed by spamassassin
   (the header shows no X-Spam at all).

look in the mail log files to see what was happening when messages are
passed through unprocessed.  SpamAssassin could be waiting on lock
files.  For example, Bayes files are locked while an automatic Bayes
expiry runs.

-jeff


Re: vbounce and out of office messages

2009-02-01 Thread Jeff Mincy
   From: Kai Schaetzl 
   Date: Sun, 01 Feb 2009 17:40:00 +0100
   
   Jeff Mincy wrote on Sun, 1 Feb 2009 10:01:49 -0500:
   
   > I use vbounce rules to detect bounce messages that were missed by
   > various procmail filtering rules.  Any message identified as a bounce
   > is processed and delivered differently in procmail rules.  So, any
   > vbounce FP is rather painful.
   
   No, it is not, unless you score these rules too high or unless you use the 
   single rules for triggering other actions. That's what SA is all about: 
   scoring. ...

Huh?   You don't want bounces to be processed as regular spam.
If you train bayes on bounces then you are training bayes to detect
bounces and pretty soon SpamAssassin will detect all bounces,
including valid bounces as spam.

This comment is taken from the 20_vbounce.cf file:
 # If you use this, set up procmail or your mail app to spot the
 # "ANY_BOUNCE_MESSAGE" rule hits in the X-Spam-Status line, and move
 # messages that match that to a 'vbounce' folder.

   ... If you try to (mis-)use it in other ways problems are to be 
   expected. That's not the fault of the vbounce rules.

The purpose of 20_vbounce is to detect and identify bounces so that
you may process bounce messages differently.

So I disagree, any FP in the vbounce rules is the fault of vbounce
rules and prevents these rules from being used as designed.

   AFAIK, the default score for the all BOUNCE rules is 0.1

Right.  If you aren't going to use the vbounce rules for extra processing
then there really isn't any point in running the rules.  The low default
score pretty much guarantees that message classification will not change
one way or the other.

-jeff


Re: vbounce and out of office messages

2009-02-01 Thread Jeff Mincy
   From: Kai Schaetzl 
   Date: Sun, 01 Feb 2009 14:31:17 +0100
   
   Karsten Bräckelmann wrote on Fri, 30 Jan 2009 19:42:16 +0100:
   
   > FWIW, and to make Michael happy, I just caught one today -- hit another
   > rule, __BOUNCE_OOO_3. Sadly, it also hit __BOUNCE_AUTO_REPLY. So there's
   > more to disable...
   
   why? Why disable a rule because of a few FPs? If that rule isn't scored in 
   any way that makes it a threat that is perfectly acceptable. It's the 
   overall behavior of a rule that makes it worth or not worth using it, not 
   a few FPs. Nobody, at least not me, expects these rules to be free of FPs.
   
I use vbounce rules to detect bounce messages that were missed by
various procmail filtering rules.  Any message identified as a bounce
is processed and delivered differently in procmail rules.  So, any
vbounce FP is rather painful.  If you aren't doing anything special
delivering bounce messages then a FP in this rule wouldn't matter very
much.

-jeff


Re: profile the various tests being done

2009-01-21 Thread Jeff Mincy
   From:  "Brian J. Murrell" 
   Date: Wed, 21 Jan 2009 19:15:19 + (UTC)
   
   I'm trying to figure out why in some cases, spamd is taking in excess of 
   1200s to process messages.  Is there any way to profile (i.e. time, or 
   timestamp) each of the tests that spamd is doing so I can see where the 
   longest ones are?

   Even enabling the kind of debug that "spamassassin -D" produces, along 
   with timestamps for each line of debug would be useful.
   
Somebody else posted this a while back.

Do spamassassin -D < email.txt 2>&1 | timestamp

where timestamp is a .function defined in .bashrc :

  function timestamp()
  { perl -MPOSIX -MTime::HiRes -n -e '
  BEGIN {$|=1; $dp=0; $t0=Time::HiRes::time};
  $t=Time::HiRes::time; $dt=$t-$t0; printf("%s%06.3f %4.3f %4.3f %s",
POSIX::strftime("%H:%M:",localtime($t)), $t-int($t/60)*60,
$dt, $dt-$dp, $_); $dp=$dt' $*
  }

Or pipe it directly to the one liner:

spamassassin -D < email.txt 2>&1 | perl -MPOSIX 

-jeff


Re: Spam with clean URI's which forward to DNSBListed URL (by HTML redirect header)

2009-01-07 Thread Jeff Mincy
   From: Theo Van Dinter 
   Date: Wed, 7 Jan 2009 11:36:18 -0500
   
   On Wed, Jan 07, 2009 at 04:46:44PM +0100, Florian Lagg wrote:
   > So - if possible - I want spamassassign to:
   > 1. Request the links in the mail body and check them for http-error 302 or
   > meta redirects
   > 2. Check the links we got by doing this against some DNSBL's
   >  
   > Is this possible? Is there a reason why we shouldn't do this?

You can look at the WebRedirect plugin on 
http://wiki.apache.org/spamassassin/CustomPlugins
   
   Possible?  Sure.
   Should?  Not unless you want to turn your (and anyone else running that 
code's)
   machine into a DDoS client.

   In other words, while it's possible to shoot yourself in the face, it's 
really
   not a good idea to do so.

There are various WARNING: PRIVACY AND TECHNICAL ISSUES listed in the
plugin.   I used the plugin for a while, but stopped using it when the
number of hits dropped off.

-jeff


Re: sa-update damages existing SA installation

2008-12-18 Thread Jeff Mincy
   From: Marcin Krol 
   Date: Thu, 18 Dec 2008 18:37:12 +0100
   
   Hello everyone,
   
   When I run sa-update -D --gpgkey 6C6191E3 --channel 
   sought.rules.yerp.org, it damages my SA installation!
   
sa-update puts rules in /var/lib/spamassassin/ Once this directory
exists all site rules are expected to come from this directory.  The
previous installation directory (eg /usr/local/share/spamassassin) is
ignored.

Try doing sa-update of the normal rules before you use sa-update of
additional rule sets.
   ...

   And my SA doesn't score any mails anymore! I have to purge the existing 
   SA (dpkg -P spamassassin), reinstall it from scratch, restore conf files 
   from backups and then it works.
   
   WTF! Does anybody know what goes wrong?
   
Use -D to print see which config files is being read by spamassassin:

   % spamassassin --lint -D 2>&1 | fgrep 'config: using'
   [31869] dbg: config: using "/etc/mail/spamassassin" for site rules pre files
   [31869] dbg: config: using "/var/lib/spamassassin/3.001007" for sys rules 
pre files
   [31869] dbg: config: using "/var/lib/spamassassin/3.001007" for default 
rules dir
   [31869] dbg: config: using "/etc/mail/spamassassin" for site rules dir
   [31869] dbg: config: using "/home/jeff/.spamassassin/user_prefs" for user 
prefs file
   [31869] dbg: config: using 
"/var/lib/spamassassin/3.001007/updates_spamassassin_org/empty.pre" for 
included file
   [31869] dbg: config: using 
"/var/lib/spamassassin/3.001007/updates_spamassassin_org/10_misc.cf" for 
included file
   [31869] dbg: config: using 
"/var/lib/spamassassin/3.001007/updates_spamassassin_org/20_advance_fee.cf" for 
included file

-jeff


Re: White List From RCVD

2008-12-11 Thread Jeff Mincy
   From: mouss 
   Date: Thu, 11 Dec 2008 19:55:44 +0100
   
   Asif Iqbal a écrit :
   > I have this in local.cf in qmail.here.net's /etc/mail/spamassassin dir
   > 
   >   whitelist_from_rcvd joe.sm...@here.com  qtdenexmbm24.AD.HERE.COM
   > 
   > But email from that address still tagged as spam. What am I doing wrong?
   > 
   
   you should run the message through spamassassin -D to see which relays
   are trusted.
   
   or you could get luck with:
   
   always_trust_envelope_sender 1
   
   
If you add a Relay header eg: 
  add_header all Relay trusted=_RELAYSTRUSTED_, untrusted=_RELAYSUNTRUSTED_

Then you want the rdns= from the first untrusted relay.

In this case it is probably:
  whitelist_from_rcvd joe.sm...@here.com here.com

THe whitelist probably wont work for here.com
because of lack of reverse dns.
  Received: from NO?REVERSE?DNS (HELO sudnp799.here.com)

The debug output should confirm this.


RE: about fake mails

2008-12-07 Thread Jeff Mincy
   From: "Giampaolo Tomassoni" <[EMAIL PROTECTED]>
   Date: Sun, 7 Dec 2008 15:52:10 +0100
   
   > -Original Message-
   > From: Yavuz Maslak [mailto:[EMAIL PROTECTED]
   > Sent: Sunday, December 07, 2008 3:02 PM
   > 
   > Ok
   > I have started to use dkim verification.  I defined whitelists in
   > local.cf.
   > it works.
   > But I could not find how I give high score for  a spammer who doesn't
   > use
   > gmail's mail servers.
   > 
   > Although a  domain has domain keys, how can I give positive score for a
   > mail
   > which comes from a fake smtp server ?
   
   There is no direct way (to my knowledge) to do this.
   
   You have to apply a positive score to all mail claiming to be "From:" a
   gmail address, then apply a negative score voiding the first one to the
   DKim-verified ones. 
   
You can write a meta rule for email that claims to be from gmail that
does not have DKIM.  

   # add some penalty points to mail from yahoo and gmail.com which
   # does not carry a valid signature; exempt mail from mailing lists
   header __L_ML1   Precedence =~ m{\b(list|bulk)\b}i
   header __L_ML2   exists:List-Id
   header __L_ML3   exists:List-Post
   header __L_ML4   exists:Mailing-List
   header __L_HAS_SNDR  exists:Sender
   meta   __L_VIA_ML(__L_ML1 || __L_ML2 || __L_ML3 || __L_ML4 || 
__L_HAS_SNDR)
   header __L_FROM_Y1   From:addr =~ [EMAIL PROTECTED]
   header __L_FROM_Y2   From:addr =~ [EMAIL PROTECTED](ar|br|cn|hk|my|sg)$}i
   header __L_FROM_Y3   From:addr =~ [EMAIL PROTECTED](id|in|jp|nz|uk)$}i
   header __L_FROM_Y4   From:addr =~ [EMAIL 
PROTECTED](ca|de|dk|es|fr|gr|ie|it|pl|se)$}i
   meta   __L_FROM_YAHOO (__L_FROM_Y1 || __L_FROM_Y2 || __L_FROM_Y3 || 
__L_FROM_Y4)
   header __L_FROM_GMAIL From:addr =~ [EMAIL PROTECTED]
   meta L_UNVERIFIED_YAHOO  (!DKIM_VERIFIED && !DK_VERIFIED && 
__L_FROM_YAHOO && !__L_VIA_ML)
   priority L_UNVERIFIED_YAHOO  500
   scoreL_UNVERIFIED_YAHOO  2.5
   meta L_UNVERIFIED_GMAIL  (!DKIM_VERIFIED && __L_FROM_GMAIL && 
!__L_VIA_ML)
   priority L_UNVERIFIED_GMAIL  500
   scoreL_UNVERIFIED_GMAIL  2.5

I got these rules from this list.  I added !DK_VERIFIED to
L_UNVERIFIED_YAHOO.

-jeff


  1   2   >