Re: Bayes_vars records on MySQL not created automatically

2013-05-10 Thread Matteo Dessalvi
Thanks for your answer Michael.

Yes you are right, using sa-learn --sync as one of the user SA will create 
the proper record on the bayes_vars table.
So I guess this is only a problem of having not received enough ham/spam email 
with this user.

Matteo



Da: Michael Parker par...@herk.net
A: Matteo Dessalvi mte...@yahoo.it 
Cc: users@spamassassin.apache.org users@spamassassin.apache.org 
Inviato: Mercoledì 8 Maggio 2013 18:43
Oggetto: Re: Bayes_vars records on MySQL not created automatically



On May 8, 2013, at 8:06 AM, Matteo Dessalvi mte...@yahoo.it wrote:

 
 I always thought that SA would be able to operate autonomously and that it 
 will create the
 proper records in all the tables of the DB. Am I missing something? Is this 
 the designed behavior?
 

It's been awhile since I wrote and looked at the code, but I'm pretty sure that 
the bayes_var entry won't be created until you learn something as that user.

Try doing an sa-learn or an auto-learn for that user and see what happens.

If memory serves the behavior was deliberate so that you wouldn't get hundreds 
of entries in bayes_var when messages are checked for users who may not be real.

Michael


Re: autolearn_discriminator callback not getting called.

2013-05-10 Thread psychobyte
So i was able to write a plugin that overrides the AWL 
check_from_in_auto_whitelist() eval rule. Thanks for the help Karsten.


= /etc/spamassassin/25_AwlIgnore.cf

##AWL ignore address types (periods in names are not supported)
## @see  AwlIgnore.pm
awl_ignore_from postmaster mailer-daemon

## Overridden AWL params
#
header AWL eval:awl_ignore_check_from_in_auto_whitelist()
describe AWLFrom: address is in the auto white-list
tflags AWL  userconf noautolearn
priority AWL1000


= /etc/spamassassin/init_AwlIgnore.pre

## Enable the AWL Ignore plugin
loadplugin Mail::SpamAssassin::Plugin::AwlIgnore

= AwlIgnore.pm

package Mail::SpamAssassin::Plugin::AwlIgnore;
#
# This plugin overrides the AWL check_from_in_auto_whitelist() method in 
order
#   to ignore specific types of addresses from getting into the 
whitelist database.
#   For Example, don't add addresses like postmas...@example.com to the 
AWL database.

#
# To activate this plugin,
# 1) Enable the Mail::SpamAssassin::Plugin::AWL plugin
# 2) Enable this plugin e.g. loadplugin 
Mail::SpamAssassin::Plugin::AwlIgnore

#  check /etc/spamassassin/init_AwlIngore.pre
# 3) Update your config. Check 25_AwlIgnore.cf.  Should look something 
like this:

#
#
### AWL ignore address types (periods in names are not supported)
## @see  AwlIgnore.pm
# awl_ignore_from postmaster mailer-daemon
#
## Overridden AWL params
#
# header AWL eval:awl_ignore_check_from_in_auto_whitelist()
# describe AWLFrom: address is in the auto white-list
# tflags AWL  userconf noautolearn
# priority AWL1000
#
#
# @todo - support ignoring local addresses w/ . in them i.e. 
user.name...@example.com




use Mail::SpamAssassin::Plugin;
use strict;

use vars qw(@ISA);
@ISA = qw(Mail::SpamAssassin::Plugin);

# constructor: register the eval rule
sub new {
  my $class = shift;
  my $mailsaobject = shift;

  # some boilerplate...
  $class = ref($class) || $class;
  my $self = $class-SUPER::new($mailsaobject);
  bless ($self, $class);
  $self-set_config($mailsaobject-{conf});

  $self-register_eval_rule ('awl_ignore_check_from_in_auto_whitelist');
  return $self;
}

#
# Load params from config
#
sub set_config {
  my($self, $conf) = @_;
  my @cmds;

=item awl_ignore_from

Ignore address types from going into the AWL database.

=cut

  push (@cmds, {
setting = 'awl_ignore_from',
type = $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST
  });
$conf-{parser}-register_commands(\@cmds);
}

#
# Replace check_from_in_auto_whitelist()
#
sub awl_ignore_check_from_in_auto_whitelist {
my ($self, $pms) = @_;

return 0 unless ($pms-{conf}-{use_auto_whitelist});

my $timer = $self-{main}-time_method(total_awl);

my $from = lc $pms-get('From:addr');
return 0 unless $from =~ /\S/;

   ## ignore addresses in awl_ignore_from
   foreach (keys %{$pms-{conf}-{awl_ignore_from}}) {
 if ($from =~ /$_\@/) {
   dbg(auto-whitelist: AWL ignoring . $from);
   return 0;
 }
   }

   # find the earliest usable originating IP.  ignore private nets
   my $origip;
   foreach my $rly (reverse (@{$pms-{relays_trusted}}, 
@{$pms-{relays_untrusted}}))

   {
 next if ($rly-{ip_private});
 if ($rly-{ip}) {
   $origip = $rly-{ip}; last;
 }
   }

   my $scores = $pms-{conf}-{scores};
   my $tflags = $pms-{conf}-{tflags};
   my $points = 0;
   my $signedby = $pms-get_tag('DKIMDOMAIN');
   undef $signedby  if defined $signedby  $signedby eq '';

   foreach my $test (@{$pms-{test_names_hit}}) {
 # ignore tests with 0 score in this scoreset,
 # or if the test is marked as noautolearn
 next if !$scores-{$test};
 next if exists $tflags-{$test}  $tflags-{$test} =~ 
/\bnoautolearn\b/;

 $points += $scores-{$test};
   }

   my $awlpoints = (sprintf %0.3f, $points) + 0;

   # Create the AWL object
   my $whitelist;
   eval {
 $whitelist = Mail::SpamAssassin::AutoWhitelist-new($pms-{main});

 my $meanscore;
 { # check
   my $timer = $self-{main}-time_method(check_awl);
   $meanscore = $whitelist-check_address($from, $origip, $signedby);
 }
 my $delta = 0;

 dbg(auto-whitelist: AWL active, pre-score: %s, autolearn score: 
%s, .

mean: %s, IP: %s, address: %s %s,
 $pms-{score}, $awlpoints,
 !defined $meanscore ? 'undef' : sprintf(%.3f,$meanscore),
 $origip || 'undef',
 $from,  $signedby ? signed by $signedby : '(not signed)');

 if (defined $meanscore) {
 $delta = $meanscore - $awlpoints;
 $delta *= $pms-{main}-{conf}-{auto_whitelist_factor};

 $pms-set_tag('AWL', sprintf(%2.1f,$delta));
   if (defined $meanscore) {
   $pms-set_tag('AWLMEAN', sprintf(%2.1f, $meanscore));
 }
 $pms-set_tag('AWLCOUNT', sprintf(%2.1f, $whitelist-count()));
 $pms-set_tag('AWLPRESCORE', sprintf(%2.1f, 

OT: installing on CentOS 6.4

2013-05-10 Thread Jari Fredriksson

I'm installaling latest CentOS, and would like to have SA in that too.

But to my disappointment, it has only SA 3.1.1 and no Razor nor Pyzor.

What would be the best method of get somewhat up to date SA to this box?

-- 

You can rent this space for only $5 a week.




signature.asc
Description: OpenPGP digital signature


Re: OT: installing on CentOS 6.4

2013-05-10 Thread Bowie Bailey

On 5/10/2013 12:22 PM, Jari Fredriksson wrote:

I'm installaling latest CentOS, and would like to have SA in that too.

But to my disappointment, it has only SA 3.1.1 and no Razor nor Pyzor.

What would be the best method of get somewhat up to date SA to this box?



rpmforge has SA 3.3.2.

http://wiki.centos.org/AdditionalResources/Repositories/RPMForge

--
Bowie


RE: installing on CentOS 6.4

2013-05-10 Thread Randal, Phil
pyzor and perl-Razor-Agent are in epel.

Cheers,

Phil

-Original Message-
From: Jari Fredriksson [mailto:ja...@iki.fi]
Sent: 10 May 2013 17:22
To: SpamAssassin Users
Subject: OT: installing on CentOS 6.4


I'm installaling latest CentOS, and would like to have SA in that too.

But to my disappointment, it has only SA 3.1.1 and no Razor nor Pyzor.

What would be the best method of get somewhat up to date SA to this box?

--

You can rent this space for only $5 a week.


Hoople Ltd, Registered in England and Wales No. 7556595
Registered office: Plough Lane, Hereford, HR4 OLE

Any opinion expressed in this e-mail or any attached files are those of the 
individual and not necessarily those of Hoople Ltd. You should be aware that 
Hoople Ltd. monitors its email service. This e-mail and any attached files are 
confidential and intended solely for the use of the addressee. This 
communication may contain material protected by law from being passed on. If 
you are not the intended recipient and have received this e-mail in error, you 
are advised that any use, dissemination, forwarding, printing or copying of 
this e-mail is strictly prohibited. If you have received this e-mail in error 
please contact the sender immediately and destroy all copies of it.


Re: OT: installing on CentOS 6.4

2013-05-10 Thread Jari Fredriksson
10.05.2013 19:27, Bowie Bailey kirjoitti:
 On 5/10/2013 12:22 PM, Jari Fredriksson wrote:
 I'm installaling latest CentOS, and would like to have SA in that too.

 But to my disappointment, it has only SA 3.1.1 and no Razor nor Pyzor.

 What would be the best method of get somewhat up to date SA to this box?


 rpmforge has SA 3.3.2.

 http://wiki.centos.org/AdditionalResources/Repositories/RPMForge

Installed rpmforge, but still offers 3.3.1.

I guess I have to cpan.

-- 

Q:  How many Zen masters does it take to screw in a light bulb?
A:  None.  The Universe spins the bulb, and the Zen master stays out
of the way.




signature.asc
Description: OpenPGP digital signature


Re: OT: installing on CentOS 6.4

2013-05-10 Thread Bowie Bailey

On 5/10/2013 1:09 PM, Jari Fredriksson wrote:

10.05.2013 19:27, Bowie Bailey kirjoitti:

On 5/10/2013 12:22 PM, Jari Fredriksson wrote:

I'm installaling latest CentOS, and would like to have SA in that too.

But to my disappointment, it has only SA 3.1.1 and no Razor nor Pyzor.

What would be the best method of get somewhat up to date SA to this box?


rpmforge has SA 3.3.2.

http://wiki.centos.org/AdditionalResources/Repositories/RPMForge


Installed rpmforge, but still offers 3.3.1.

I guess I have to cpan.


That's strange.  Now that I actually try to install it myself, I see the 
same thing (my home server uses this repo, but I haven't updated in a 
while).  But if you follow the link for the list of packages on the wiki 
page, it lists 3.3.2 for Centos 4, 5, and 6.


And, even stranger, if I browse directly to the repo url, I can't find 
spamassassin at all.


Maybe you should ask on their list.

http://lists.repoforge.org/mailman/listinfo/users

--
Bowie


RE: Default Bayes Database

2013-05-10 Thread Andrew Talbot
You all are keeping me sane and grounded as I deal with the Powers That Be
here trying to set this up. It's good to know that I'm not wrong (I agree
with everything everyone has said, and pointed out from the beginning a
default database would be awful). 

And this:  If he insists on starting with a pre-populated Bayes database,
he sure knows why. Other than I'm the boss, I want.  ... Is exactly
right too. 

We're implementing it locally with auto-learning enabled this weekend (oh,
yeah, boss didn't want auto-learning enabled either..). 

So here goes!! 

Thanks for all your help. 


 -Original Message-
 From: Karsten Bräckelmann [mailto:guent...@rudersport.de]
 Sent: Wednesday, May 08, 2013 8:18 PM
 To: users@spamassassin.apache.org
 Subject: Re: Default Bayes Database
 
 On Wed, 2013-05-08 at 14:09 -0400, Andrew Talbot wrote:
  Well, I certainly hope someone offers to help!
 
 Heh! I am really confident, Alex didn't mean to be rude, neither that he
 actually hopes no one will help you. Quite the contrary...
 
 He DID try to help you by explaining why a default Bayes database is a
bad
 idea in the first place. And that was his way of telling you...
 
  If only to say there is no default database.
 
 That. :)  There is none, and there never has been.
 
 
  As we've spoken about off-list, my boss is being very particular about
  the deployment of Bayes, and it sounds like one of his caveats is that
  we don't start from a blank database.
 
 I can see how the idea of basing off of some known to be classified
 tokens sounds tempting. However, there is no such token. None. Just try to
 imagine working in an industry where e.g. Viagra and Cialis are totally
legit
 phrases to use...
 
 Feel free to direct your boss here. If he insists on starting with a pre-
 populated Bayes database, he sure knows why. Other than I'm the boss, I
 want.
 
 
 Anyway, Andrew, your idea of that whole blank slate is inaccurate. If
you
 import someone else's data, before importing your database has been
 empty.
 
 If you collect some ham and spam for initial training, before training
your
 database has been empty.
 
 You even do NOT have to deploy SA prior to that. I don't know the size of
 your user base, but it seems it shouldn't be hard to have a few of the
users
 chip in. Get a few of them to collect hand-classified ham and spam for
you.
 Train Bayes with that. After that, deploy SA to your mail processing
chain.
 
 There you go! A pre-populated Bayes database, based on YOUR particular
 ham and spam tokens, before deploying SA in production.
 
 
 --
 char
 *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4
 ;
 main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8?
c=1:
 (c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0;
}}}




Re: Default Bayes Database

2013-05-10 Thread David F. Skoll
On Wed, 08 May 2013 19:32:26 +0200
Axb axb.li...@gmail.com wrote:

 - your HAM is somebody else's SPAM

Do you have evidence for that?  The reason I ask is that one of the
main features of our (commercial) anti-spam solution is a very large
Bayes database.  Once a night, we aggregate all the tokens from votes from
all of our customers and push out a Bayes database containing tokens for the
last 21 days from about 3.2 million spam and 3.4 million ham messages.

It works really well and we find that even our highly diverse customer
database agrees substantially on spam vs. ham.

There was a USENIX paper on this topic quite a while ago:
http://static.usenix.org/event/lisa04/tech/blosser/blosser_html/
It won the best paper award for LISA '04.

 - A decent Bayes DB is highly dynamic and yesterday's tokens from 
 someone else's traffic will be useless to you traffic, today.

Not true.  Bayes data remains relevant for several days, if not weeks or
months.

Obviously, our system *also* includes individual Bayes databases that adapt
to specific users' mail flows and updates more than once a day, but even the
daily-updated central database is surprisingly good.  (It seems that a large
sample size is the key.)

Karsten Bräckelmann wrote:

 Just try to imagine working in an industry where e.g. Viagra and
 Cialis are totally legit phrases to use...

Actually, we find that is not a problem because spammers use things
like Vi@gr@ and C1AL1S that are far more damning than the unmodified words
themselves.  Also, our Bayes implementation uses word pairs as well as
individual words which improves its selectivity.

Anyway, my main point is this: Don't dismiss a shared Bayes database
without supplying evidence that it's a bad idea. :)

Regards,

David.


Re: Default Bayes Database

2013-05-10 Thread Karsten Bräckelmann
On Fri, 2013-05-10 at 15:51 -0400, David F. Skoll wrote:
 On Wed, 08 May 2013 19:32:26 +0200 Axb axb.li...@gmail.com wrote:
 
  - your HAM is somebody else's SPAM
 
 Do you have evidence for that?

Evidence... examples, rather.

I happened to be the lucky recipient of specific spam campaigns in
languages I do not speak. Campaign referring to quite a few samples
during a specific, relatively short time period. This definitely
happened with French, Spanish, and Turkish. Odds are high for any word
in those languages being on the seriously spammy side. Unlike for anyone
actually speaking these languages...

Being easily associated with particular water sports is like a magnet
for getting spammed with totally unrelated water sports. One style is
good, all others are bad-ish. That would be the same for other folks,
though with different signs.

I do receive quite specific campaigns, plain text, no obfuscation,
offering private health insurance (Private Krankenversicherung in
German). That is a totally valid phrase. Unlike English, German tends to
concatenate words to form specifics -- Krankenversicherung is pretty
much a word-by-word translation of health insurance. This makes the
word more rare, health on its own in comparison hardly gives a hint.
And the totally legit word is spammy for me, because I usually do not
talk about that topic in mail. My next door neighbor probably would
disagree...

Your ham is someone else's spam on a different level: There are quite
a few reports in bugzilla, where an obfuscation pattern matches a legit
word in non-English languages.

Accents are good for obfuscation. But accents also are entirely legit.

Paypal. And them notifying their customers about changes in the terms of
use. And actually sending out the full terms of use in the same mail. In
this case, again, German -- but they managed to score a whopping 12.2
once for me. Yes, of course, BAYES_99.

Plus some other shady-business indicating rules, triggered various
times: FUZZY_CREDIT, TRACKER_ID, URI_DOT_INFO.
Oh, lovely. That 2009 sample has FUZZY_VLIUM and FRT_VALIUMx.


 Karsten Bräckelmann wrote:
  Just try to imagine working in an industry where e.g. Viagra and
  Cialis are totally legit phrases to use...
 
 Actually, we find that is not a problem because spammers use things
 like Vi@gr@ and C1AL1S that are far more damning than the unmodified words
 themselves.

That was one quick example. See above for a similar scenario not
involving medication, but sports.

 Also, our Bayes implementation uses word pairs as well as
 individual words which improves its selectivity.

Good for you, but that is irrelevant to the discussion at hand, which is
about the Bayes engine in SA.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Default Bayes Database

2013-05-10 Thread Bob Proulx
David F. Skoll wrote:
 Axb wrote:
  - your HAM is somebody else's SPAM
 
 Do you have evidence for that?  The reason I ask is that one of the
 main features of our (commercial) anti-spam solution is a very large
 Bayes database.  Once a night, we aggregate all the tokens from votes from
 all of our customers and push out a Bayes database containing tokens for the
 last 21 days from about 3.2 million spam and 3.4 million ham messages.
 
 It works really well and we find that even our highly diverse customer
 database agrees substantially on spam vs. ham.

The weasel words agrees substantially is telling.  If it isn't 100%
with no false positives then at least one of those messages does not
agree.  That would be the evidence requested.

I am not saying that your technique isn't useful.  It is very
pragmatic.  I am sure it is very effective.  I would probably do that
myself.  But it isn't 100%.

And would you suggest distributing your well-averaged database to
people who install SpamAssassin to as to seed their Bayes?  How would
that be distributed for users to use when installing SpamAssassin?
And if you did how would this large corpus of learned symbols affect
the smaller amount of messages the user trains with when they
train-on-error?  Would it swamp it by the much larger numbers?  It is
trouble.

I think having users start with a blank slate and then start learning
from their own messages makes the most sense.  And users can always
learn from their current mailbox of past messages so it isn't much
hardship.

Bob


Re: Default Bayes Database

2013-05-10 Thread David F. Skoll
On Fri, 10 May 2013 15:34:13 -0600
Bob Proulx b...@proulx.com wrote:

 The weasel words agrees substantially is telling.  If it isn't 100%
 with no false positives then at least one of those messages does not
 agree.  That would be the evidence requested.

 I am not saying that your technique isn't useful.  It is very
 pragmatic.  I am sure it is very effective.  I would probably do that
 myself.  But it isn't 100%.

Nothing is 100%.  Even personal Bayes databases are not 100%.

 And would you suggest distributing your well-averaged database to
 people who install SpamAssassin to as to seed their Bayes?

We have a distribution mechanism built into our software.

 I think having users start with a blank slate and then start learning
 from their own messages makes the most sense.

Maybe.  But I know that our (commercial) customers expect high catch
rates out of the box, and we get that with our shared Bayes database.

 And users can always learn from their current mailbox of past
 messages so it isn't much hardship.

Right; pretend you're a salesperson trying to sell an anti-spam product.
Oh, you just have to go through your old mailbox and classify a few
hundred messages by hand... then the system will work great!

No sale.

Regards,

David.


Re: Default Bayes Database

2013-05-10 Thread David F. Skoll
On Fri, 10 May 2013 23:14:36 +0200
Karsten Bräckelmann guent...@rudersport.de wrote:

 I happened to be the lucky recipient of specific spam campaigns in
 languages I do not speak. Campaign referring to quite a few samples
 during a specific, relatively short time period. This definitely
 happened with French, Spanish, and Turkish. Odds are high for any word
 in those languages being on the seriously spammy side. Unlike for
 anyone actually speaking these languages...

We (probably) have a much larger sample population, so this tends not
to be as much of a problem for us.

 I do receive quite specific campaigns, plain text, no obfuscation,
 offering private health insurance (Private Krankenversicherung in
 German). That is a totally valid phrase. Unlike English, German tends
 to concatenate words to form specifics -- Krankenversicherung is
 pretty much a word-by-word translation of health insurance. This
 makes the word more rare, health on its own in comparison hardly
 gives a hint. And the totally legit word is spammy for me, because I
 usually do not talk about that topic in mail. My next door neighbor
 probably would disagree...

Again, the key is a large sample size.

 Your ham is someone else's spam on a different level: There are
 quite a few reports in bugzilla, where an obfuscation pattern matches
 a legit word in non-English languages.

These are edge cases that are pretty easily handled with personal
Bayes databases or whitelisting if the system keeps getting it wrong.

 Accents are good for obfuscation. But accents also are entirely legit.

And we can tell which is which, based on a large sample size.

 Paypal. And them notifying their customers about changes in the terms
 of use. And actually sending out the full terms of use in the same
 mail. In this case, again, German -- but they managed to score a
 whopping 12.2 once for me. Yes, of course, BAYES_99.

Was this with your personal Bayes data?  Even that can be wrong sometimes...

Regards,

David.


Bayes Data Base

2013-05-10 Thread Rick Cone
Hello,

 

I was curious if somebody out there publishes a Spamassassin Bayes SPAM/HAM
data base that someone could buy or subscribe to?  If so, please provide
details if known.

 

Thanks,

Rick

 


 


 


 

 



Re: [SA-Users] Re: OT: installing on CentOS 6.4

2013-05-10 Thread John R. Dennison
On Fri, May 10, 2013 at 02:42:21PM -0400, Bowie Bailey wrote:
 That's strange.  Now that I actually try to install it myself, I see
 the same thing (my home server uses this repo, but I haven't updated
 in a while).  But if you follow the link for the list of packages on
 the wiki page, it lists 3.3.2 for Centos 4, 5, and 6.
 
 And, even stranger, if I browse directly to the repo url, I can't
 find spamassassin at all.
 
 Maybe you should ask on their list.
 
 http://lists.repoforge.org/mailman/listinfo/users

Due to the fact that the rpmforge package stomps on the SA package in
CentOS base it is in the rpmforge-extras repo not the mainline rpmforge
repo.

Add exclude=spamassassin to /etc/yum.repos.d/CentOS-Base.repo to prevent
it from being installed/updated from base/updates and then just install
it via yum with yum --enablerepo=rpmforge-extras install spamassassin.





John
-- 
The basic problem can be summed up with four numbers:

0.26% of Americans give more than $200 in a congressional election;
0.05% max out;
0.01% give more than $10,000;
.63% -- 196 Americans -- have given more than 80% of the
 superPAC money spent so far in this election.

-- Larry Lessig: The corruption of the American political system, Durham, NC,
   posted 13 June 2012 by Melanie Chernoff


pgpI0vI49GCsI.pgp
Description: PGP signature


Re: autolearn_discriminator callback not getting called.

2013-05-10 Thread Karsten Bräckelmann
On Fri, 2013-05-10 at 08:57 -0700, psychobyte wrote:
 So i was able to write a plugin that overrides the AWL 
 check_from_in_auto_whitelist() eval rule. Thanks for the help Karsten.

You're welcome. Was actually fun digging through the code.


 # awl_ignore_from postmaster mailer-daemon

Configuration option, nice, yeah.

 ## Overridden AWL params
 #
 # header AWL eval:awl_ignore_check_from_in_auto_whitelist()
 # describe AWLFrom: address is in the auto white-list
 # tflags AWL  userconf noautolearn
 # priority AWL1000

 # Replace check_from_in_auto_whitelist()
 #
 sub awl_ignore_check_from_in_auto_whitelist {
  my ($self, $pms) = @_;
[...]
 ## ignore addresses in awl_ignore_from
 foreach (keys %{$pms-{conf}-{awl_ignore_from}}) {
   if ($from =~ /$_\@/) {
 dbg(auto-whitelist: AWL ignoring . $from);
 return 0;
   }
 }
 
 # find the earliest usable originating IP.  ignore private nets
 my $origip;

Whoa, is this... Yes, a copy of the check_from_in_auto_whitelist()
function from the AWL plugin. Code duplication.

Any reason you didn't just hack the AWL.pm code? All you would need is
the contents of your plugin's sub set_config, and the single foreach
loop doing the actual work.

Slightly more than 10 lines, including your POD. (Yay for that, btw!)

No overriding of the existing AWL rule definition, just a single conf
line. No naughty code duplication.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Default Bayes Database

2013-05-10 Thread Karsten Bräckelmann
On Fri, 2013-05-10 at 17:58 -0400, David F. Skoll wrote:
 On Fri, 10 May 2013 23:14:36 +0200 Karsten Bräckelmann wrote:

 We (probably) have a much larger sample population, so this tends not
 to be as much of a problem for us.

This thread is about a default Bayes database, suitable for distri-
bution. Not a humongous database with millions of tokens.

It also would have to be usable on small sites, as well as company wide.
Train on error should not be overruled by the sheer number of tokens and
occurrences of them.

 Again, the key is a large sample size.

Yup. In the outlined case, the large sample size would most likely push
that token towards no man's land. It is, after all, a totally valid and
actually used word.

You asked for cases of your ham is someone else's spam. That is
precisely one such case.

Your repeated counter-argument / solution of a large sample size
translates to neither ham nor spam. Not helpful.

We're talking Bayes, thus in tokens. Spam for me, ham for me neighbor
(yes, literally).

 These are edge cases that are pretty easily handled with personal
 Bayes databases or whitelisting if the system keeps getting it wrong.

Exactly. Personal Bayes databases. The opposite of a default database.


  Paypal. And them notifying their customers about changes in the terms
  of use. And actually sending out the full terms of use in the same
  mail. In this case, again, German -- but they managed to score a
  whopping 12.2 once for me. Yes, of course, BAYES_99.
 
 Was this with your personal Bayes data?  Even that can be wrong sometimes...

Yes, it was. And yes, it can. :)


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Bayes Data Base

2013-05-10 Thread Karsten Bräckelmann
On Fri, 2013-05-10 at 16:02 -0600, Rick Cone wrote:
 I was curious if somebody out there publishes a Spamassassin Bayes
 SPAM/HAM data base that someone could buy or subscribe to?  If so,
 please provide details if known.

Wow, I'm floored.

Reading the last 3 days worth of posts might get you a pretty clear
picture and answer to your question.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Default Bayes Database

2013-05-10 Thread John Hardin

On Fri, 10 May 2013, David F. Skoll wrote:


Anyway, my main point is this: Don't dismiss a shared Bayes database
without supplying evidence that it's a bad idea. :)


Care to share your database? :)

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  From the Liberty perspective, it doesn't matter if it's a
  jackboot or a Birkenstock smashing your face. -- Robb Allen
---
 344 days since the first successful private support mission to ISS (SpaceX)


Re: Default Bayes Database

2013-05-10 Thread Bob Proulx
David F. Skoll wrote:
 Bob Proulx wrote:
  And would you suggest distributing your well-averaged database to
  people who install SpamAssassin to as to seed their Bayes?
 
 We have a distribution mechanism built into our software.

  I think having users start with a blank slate and then start learning
  from their own messages makes the most sense.
 
 Maybe.  But I know that our (commercial) customers expect high catch
 rates out of the box, and we get that with our shared Bayes database.
 
  And users can always learn from their current mailbox of past
  messages so it isn't much hardship.
 
 Right; pretend you're a salesperson trying to sell an anti-spam product.
 Oh, you just have to go through your old mailbox and classify a few
 hundred messages by hand... then the system will work great!
 
 No sale.

Your database sounds just simply wonderful.  Where can I download this
database so that I can start using it?

Bob


Re: [SA-Users] Re: OT: installing on CentOS 6.4

2013-05-10 Thread Jari Fredriksson
11.05.2013 01:03, John R. Dennison kirjoitti:
 On Fri, May 10, 2013 at 02:42:21PM -0400, Bowie Bailey wrote:
 That's strange.  Now that I actually try to install it myself, I see
 the same thing (my home server uses this repo, but I haven't updated
 in a while).  But if you follow the link for the list of packages on
 the wiki page, it lists 3.3.2 for Centos 4, 5, and 6.

 And, even stranger, if I browse directly to the repo url, I can't
 find spamassassin at all.

 Maybe you should ask on their list.

 http://lists.repoforge.org/mailman/listinfo/users
 Due to the fact that the rpmforge package stomps on the SA package in
 CentOS base it is in the rpmforge-extras repo not the mainline rpmforge
 repo.

 Add exclude=spamassassin to /etc/yum.repos.d/CentOS-Base.repo to prevent
 it from being installed/updated from base/updates and then just install
 it via yum with yum --enablerepo=rpmforge-extras install spamassassin.


Thank You Very Much :)

This works.

-- 

Its name is Public Opinion.  It is held in reverence.  It settles everything.
Some think it is the voice of God.
-- Mark Twain




signature.asc
Description: OpenPGP digital signature


Re: Default Bayes Database

2013-05-10 Thread Karsten Bräckelmann
On Fri, 2013-05-10 at 17:49 -0400, David F. Skoll wrote:
 Right; pretend you're a salesperson trying to sell an anti-spam product.
 Oh, you just have to go through your old mailbox and classify a few
 hundred messages by hand... then the system will work great!
 
 No sale.

Most likely, and no one argued against it.

Last time I checked, SA was a project aiming at the admin type of guy,
not the pointy haired boss who wants to simply buy a device and get over
the issue, neither the end-user. Also, SA itself is not for sale...

The OP, Andrew, clearly is the admin type. I'd guess the mere fact that
he's actively discussing and tended to the SA users list in the first
place, is a telltale sign.


On the topic of Bayes: No one argued against a shared database. As a
matter of fact, SA does deliberately support site-wide shared Bayes, and
offers documentation. However,

  shared != default


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: installing on CentOS 6.4

2013-05-10 Thread Jari Fredriksson
10.05.2013 19:31, Randal, Phil kirjoitti:
 pyzor and perl-Razor-Agent are in epel.

 Cheers,

 Phil
Epel is a must, wonder how I forgot it. Now I have it, and those and
running. Thanks!

 -Original Message-
 From: Jari Fredriksson [mailto:ja...@iki.fi]
 Sent: 10 May 2013 17:22
 To: SpamAssassin Users
 Subject: OT: installing on CentOS 6.4


 I'm installaling latest CentOS, and would like to have SA in that too.

 But to my disappointment, it has only SA 3.1.1 and no Razor nor Pyzor.

 What would be the best method of get somewhat up to date SA to this box?

 --

 You can rent this space for only $5 a week.


 Hoople Ltd, Registered in England and Wales No. 7556595
 Registered office: Plough Lane, Hereford, HR4 OLE

 Any opinion expressed in this e-mail or any attached files are those of the 
 individual and not necessarily those of Hoople Ltd. You should be aware that 
 Hoople Ltd. monitors its email service. This e-mail and any attached files 
 are confidential and intended solely for the use of the addressee. This 
 communication may contain material protected by law from being passed on. If 
 you are not the intended recipient and have received this e-mail in error, 
 you are advised that any use, dissemination, forwarding, printing or copying 
 of this e-mail is strictly prohibited. If you have received this e-mail in 
 error please contact the sender immediately and destroy all copies of it.



-- 

Good day for a change of scene.  Repaper the bedroom wall.




signature.asc
Description: OpenPGP digital signature


Re: Default Bayes Database

2013-05-10 Thread David F. Skoll
On Fri, 10 May 2013 15:52:01 -0700 (PDT)
John Hardin jhar...@impsec.org wrote:

  Anyway, my main point is this: Don't dismiss a shared Bayes database
  without supplying evidence that it's a bad idea. :)

 Care to share your database? :)

Ah... hmm. :)

I would be happy to share it with SA developers who might be
contemplating making some sort of shared Bayes feature in SA and who
would only use the database for research purposes.

But I can't make it generally available.  If anyone is really interested,
please contact me off-list.

Regards,

David.