Re: Annoying auto_whitelist

2009-07-12 Thread Matus UHLAR - fantomas
> > RW wrote:
> > > The much more common scenario is that the first spam hits BAYES_50
> > > and subsequent BAYES_99 hits are countered by a negative  AWL score.

> On Fri, 10 Jul 2009 08:09:04 -0400
> Matt Kettler  wrote:
> > Technically, this only counters half the score. It also gets "paid
> > back" later. It raises the stored average that will apply to
> > subsequent messages.

On 10.07.09 18:57, RW wrote:
> So what's the point of including  BAYES_99 in AWL?

The point is not excluding very usefull info like score of BAYES_00 or
BAYES_99 for later e-mail.

> but there's only a benefit if the BAYES_XX score falls, otherwise
> the distortion to the score just gets less bad - I don't see how you
> can describe that as "paid back".   

> > I'd also argue it's a rather rare case. Most of my spam hits BAYES_99
> > the first shot around, and most has varying sender address and IP. The
> > odds of one having increasing score and the same sender address/ip
> > seems extraordinarily unlikely to me.

> If something scarcely every makes a difference, and on the occasion it
> does, gets it wrong more often then it gets it right, I don't see the
> point in keeping it.

That paragraph was about AWL as a whole, not about including/excluding BAYES
scores into.
-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Honk if you love peace and quiet. 


Re: Annoying auto_whitelist

2009-07-10 Thread RW
On Fri, 10 Jul 2009 08:09:04 -0400
Matt Kettler  wrote:

> RW wrote:

> > The much more common scenario is that the first spam hits BAYES_50
> > and subsequent BAYES_99 hits are countered by a negative  AWL score.
> >   
> Technically, this only counters half the score. It also gets "paid
> back" later. It raises the stored average that will apply to
> subsequent messages.

but there's only a benefit if the BAYES_XX score falls, otherwise
the distortion to the score just gets less bad - I don't see how you
can describe that as "paid back".   
 
 
> I'd also argue it's a rather rare case. Most of my spam hits BAYES_99
> the first shot around, and most has varying sender address and IP. The
> odds of one having increasing score and the same sender address/ip
> seems extraordinarily unlikely to me.

So what's the point of including  BAYES_99 in AWL?

If something scarcely every makes a difference, and on the occasion it
does, gets it wrong more often then it gets it right, I don't see the
point in keeping it.


Re: Annoying auto_whitelist

2009-07-10 Thread Matt Kettler
RW wrote:
> On Fri, 10 Jul 2009 12:33:51 +0200
> Matus UHLAR - fantomas  wrote:
>
>   
 On Sat, 04 Jul 2009 08:56:35 -0400
 Matt Kettler  wrote:
 
> Please be aware the AWL is NOT whitelist, or a blacklist, and
> the scores don't really quite work the way they look. The AWL is
> essentially an averager, and as such, it's sometimes going to
> assign negative scores to spam sometimes.
>   
 And it works from its own version of the score that ignores
 whitelisting and bayes scores. So if learning a spam leads to the
 next spam from the same address getting a higher bayes score,
 that benefit isn't washed-out by AWL. 
 
>> On 04.07.09 22:42, RW wrote:
>> 
>>> I take that back, I thought the the BAYES_XX rules were ignored by
>>> AWL, but they aren't.
>>>
>>> Personally I think BAYES should be ignored by AWL, emails from the
>>> same "from address" and ip address will have a lot of tokens in
>>> common.  They should train quickly, and there shouldn't be any need
>>> to "damp-out" that learning.
>>>   
>> I don't think so. Teaching BAYES is a good way to hint AWL which way
>> should it push scores. By ignoring bayes, you could move much spam
>> the ham-way since much of spam isn't catched by other scores than
>> BAYES, and vice versa.
>>
>> 
> Right, but that's only a benefit if the BAYES score drops - remember
> it's an averaging system. Personally I only have a single spam in my
> spam corpus that has a AWL hit and doesn't hit BAYES_99, and that hits
> BAYES_95. Sending multiple spams from the same from address and IP
> address is a gift to Bayesian filters.
>
> The much more common scenario is that the first spam hits BAYES_50 and
> subsequent BAYES_99 hits are countered by a negative  AWL score.
>   
Technically, this only counters half the score. It also gets "paid back"
later. It raises the stored average that will apply to subsequent messages.

I'd also argue it's a rather rare case. Most of my spam hits BAYES_99
the first shot around, and most has varying sender address and IP. The
odds of one having increasing score and the same sender address/ip seems
extraordinarily unlikely to me.

Besides, the real problem there isn't the AWL, but the fact that the
first message scored low.

Are you really seeing cases where this is causing false negatives, or
are you just pontificating about what's possible?




Re: Annoying auto_whitelist

2009-07-10 Thread RW
On Fri, 10 Jul 2009 12:33:51 +0200
Matus UHLAR - fantomas  wrote:

> > > On Sat, 04 Jul 2009 08:56:35 -0400
> > > Matt Kettler  wrote:
> > > > Please be aware the AWL is NOT whitelist, or a blacklist, and
> > > > the scores don't really quite work the way they look. The AWL is
> > > > essentially an averager, and as such, it's sometimes going to
> > > > assign negative scores to spam sometimes.
> 
> > > And it works from its own version of the score that ignores
> > > whitelisting and bayes scores. So if learning a spam leads to the
> > > next spam from the same address getting a higher bayes score,
> > > that benefit isn't washed-out by AWL. 
> 
> On 04.07.09 22:42, RW wrote:
> > I take that back, I thought the the BAYES_XX rules were ignored by
> > AWL, but they aren't.
> > 
> > Personally I think BAYES should be ignored by AWL, emails from the
> > same "from address" and ip address will have a lot of tokens in
> > common.  They should train quickly, and there shouldn't be any need
> > to "damp-out" that learning.
> 
> I don't think so. Teaching BAYES is a good way to hint AWL which way
> should it push scores. By ignoring bayes, you could move much spam
> the ham-way since much of spam isn't catched by other scores than
> BAYES, and vice versa.
> 
Right, but that's only a benefit if the BAYES score drops - remember
it's an averaging system. Personally I only have a single spam in my
spam corpus that has a AWL hit and doesn't hit BAYES_99, and that hits
BAYES_95. Sending multiple spams from the same from address and IP
address is a gift to Bayesian filters.

The much more common scenario is that the first spam hits BAYES_50 and
subsequent BAYES_99 hits are countered by a negative  AWL score.

 


Re: Annoying auto_whitelist

2009-07-10 Thread Matus UHLAR - fantomas
> > On Sat, 04 Jul 2009 08:56:35 -0400
> > Matt Kettler  wrote:
> > > Please be aware the AWL is NOT whitelist, or a blacklist, and the
> > > scores don't really quite work the way they look. The AWL is
> > > essentially an averager, and as such, it's sometimes going to assign
> > > negative scores to spam sometimes.

> > And it works from its own version of the score that ignores
> > whitelisting and bayes scores. So if learning a spam leads to the next
> > spam from the same address getting a higher bayes score, that benefit
> > isn't washed-out by AWL. 

On 04.07.09 22:42, RW wrote:
> I take that back, I thought the the BAYES_XX rules were ignored by AWL,
> but they aren't.
> 
> Personally I think BAYES should be ignored by AWL, emails from the same
> "from address" and ip address will have a lot of tokens in common.  They
> should train quickly, and there shouldn't be any need to "damp-out"
> that learning.

I don't think so. Teaching BAYES is a good way to hint AWL which way should
it push scores. By ignoring bayes, you could move much spam the ham-way
since much of spam isn't catched by other scores than BAYES, and vice versa.

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
WinError #9: Out of error messages.


Re: Annoying auto_whitelist

2009-07-05 Thread Benny Pedersen

On Sat, July 4, 2009 20:55, Michelle Konzack wrote:

> To prevent manualy learning of the MEDS spams I have set  my  MEDS-Score
> to 8.00 and do not get any spams except "caNN" and "genNN".

perldoc Mail::SpamAssassin::Plugin::AWL

see the awl factor setting, default its 0.5, so if you dont like this, change 
it to 0.25 then it will benefit less for the spammer
if he used your email / ip

got it ?

-- 
xpoint



Re: Annoying auto_whitelist

2009-07-05 Thread Benny Pedersen

On Sat, July 4, 2009 20:50, Michelle Konzack wrote:
> Goog evening Jari,
>
> Am 2009-07-04 13:46:45, schrieb Jari Fredriksson:
>> http://wiki.apache.org/spamassassin/BetterDocumentation/SqlReadmeAwl
>
> Thankyou for the link, but if I understand  it  right,  spamassassin  is
> then using ONE Database/Table for ALL users...  This mean, the  Database
> will grow more then 10.000 ROW's a day...
>
> Is in spamassassin something like an autoexpire?
>
> Most spams I get are with UNIQUE From: header.  I allready collect  this
> infos using procmail recipes...  And since 2002 I have  collectedt  over
> 27 million different E-Mails


CREATE TABLE `awl` (
  `username` varchar(100) NOT NULL default '',
  `email` varchar(200) NOT NULL default '',
  `ip` varchar(10) NOT NULL default '',
  `count` int(11) default '0',
  `totscore` float default '0',
  `lastupdate` timestamp NOT NULL default CURRENT_TIMESTAMP on update 
CURRENT_TIMESTAMP,
  PRIMARY KEY  (`username`,`email`,`ip`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


CREATE TABLE `bayes_seen` (
  `id` int(11) NOT NULL default '0',
  `msgid` varchar(200) character set utf8 collate utf8_bin NOT NULL default '',
  `flag` char(1) NOT NULL default '',
  `lastupdate` timestamp NOT NULL default CURRENT_TIMESTAMP on update 
CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`,`msgid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


all the rest expire natively in sa, the above 2 tables can now expire in a 
cron, how to do this is upto others to deside :)

-- 
xpoint



Re: Annoying auto_whitelist

2009-07-05 Thread Benny Pedersen

On Sat, July 4, 2009 10:20, Michelle Konzack wrote:

> ...because the Spamer From: is in the auto_whitelist.

aRG :/

from and SENDER IP is in the awl table, where is the problem ?

if you match the sender ip very well (/16 fuzzy) then i see the problem

and btw awl is NOT a whitelist !

-- 
xpoint



Re: Annoying auto_whitelist

2009-07-04 Thread RW
On Sat, 4 Jul 2009 20:55:12 +0200
Michelle Konzack  wrote:

> Am 2009-07-04 13:12:07, schrieb RW:
> > So what happens if you don't remove it, what error do you get when
> > you run sa-learn?#
> 
> If I do not remove it beforre "sa-learn --spam", I get an  negative
> AWL score.
> 
> If I remove it, and run "sa-learn --spam" again, AWL is not
> mentiioned.

If you're interested, what I've done is add the following to my
local.cf:

tflags BAYES_00 noautolearn nice learn
tflags BAYES_05 noautolearn nice learn
tflags BAYES_20 noautolearn nice learn
tflags BAYES_40 noautolearn nice learn
tflags BAYES_50 noautolearn learn
tflags BAYES_60 noautolearn learn
tflags BAYES_80 noautolearn learn
tflags BAYES_95 noautolearn learn
tflags BAYES_99 noautolearn learn

This should completely decouple BAYES and AWL, and so remove the lag
between learning and full-scoring (i.e. no more deleting AWL entries
before sa-learn).

*NOTE* that it does require a one-off reset of the AWL database to avoid
weird AWL scores. 




Re: Annoying auto_whitelist

2009-07-04 Thread Jari Fredriksson
> Goog evening Jari,
>
> Am 2009-07-04 13:46:45, schrieb Jari Fredriksson:
>> http://wiki.apache.org/spamassassin/BetterDocumentation/SqlReadmeAwl
>
> Thankyou for the link, but if I understand  it  right,  spamassassin  is
> then using ONE Database/Table for ALL users...  This mean, the  Database
> will grow more then 10.000 ROW's a day...
>
> Is in spamassassin something like an autoexpire?
>

You can add to the awl table a timeupdated  field with properties "default
current_timestamp on update current_timestamp" at least in MySQL.

And cron the autoexpire with it.

> Most spams I get are with UNIQUE From: header.  I allready collect  this
> infos using procmail recipes...  And since 2002 I have  collectedt  over
> 27 million different E-Mails
>

100-200 megabytes data, which your current awl-database must contain
already. No big deal for an rdbms?


Re: Annoying auto_whitelist

2009-07-04 Thread RW
On Sat, 4 Jul 2009 14:09:29 +0100
RW  wrote:

> On Sat, 04 Jul 2009 08:56:35 -0400
> Matt Kettler  wrote:
> 
> > Please be aware the AWL is NOT whitelist, or a blacklist, and the
> > scores don't really quite work the way they look. The AWL is
> > essentially an averager, and as such, it's sometimes going to assign
> > negative scores to spam sometimes.
> 
> And it works from its own version of the score that ignores
> whitelisting and bayes scores. So if learning a spam leads to the next
> spam from the same address getting a higher bayes score, that benefit
> isn't washed-out by AWL. 

I take that back, I thought the the BAYES_XX rules were ignored by AWL,
but they aren't.

Personally I think BAYES should be ignored by AWL, emails from the same
"from address" and ip address will have a lot of tokens in common.  They
should train quickly, and there shouldn't be any need to "damp-out"
that learning.


Re: Annoying auto_whitelist

2009-07-04 Thread wolfgang
In an older episode (Saturday, 4. July 2009), Michelle Konzack wrote:

> If I do not remove it beforre "sa-learn --spam", I get an  negative 
> AWL score.
>
> If I remove it, and run "sa-learn --spam" again, AWL is not 
> mentiioned.

In my understanding, the fact that the "From:" address is in the AWL 
with a negative score does *not* prevent sa-learn from learning the 
message as spam.

The effect that various tokens from the mail are learned as "spammy" in 
the Bayes DB is far more important in my view.

And since the sender addresses are unique, their negative AWL score 
won't hurt much IMHO - except for increasing the size of the 
auto_whitelist.

So, removing them may be a good idea, but I don't think it is necessary 
for sa-learn to be effective.

My 0.02 EUR.

Regards,

wolfgang

>
> To prevent manualy learning of the MEDS spams I have set  my 
> MEDS-Score to 8.00 and do not get any spams except "caNN" and
> "genNN".
>
> Thanks, Greetings and nice Day/Evening
> Michelle Konzack
> Systemadministrator
> Tamay Dogan Network
> Debian GNU/Linux Consultant


Re: Annoying auto_whitelist

2009-07-04 Thread Michelle Konzack
Am 2009-07-04 13:12:07, schrieb RW:
> So what happens if you don't remove it, what error do you get when you
> run sa-learn?#

If I do not remove it beforre "sa-learn --spam", I get an  negative  AWL
score.

If I remove it, and run "sa-learn --spam" again, AWL is not  mentiioned.

To prevent manualy learning of the MEDS spams I have set  my  MEDS-Score
to 8.00 and do not get any spams except "caNN" and "genNN".

Thanks, Greetings and nice Day/Evening
Michelle Konzack
Systemadministrator
Tamay Dogan Network
Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
Michelle Konzack   c/o Shared Office KabelBW  ICQ #328449886
+49/177/9351947Blumenstasse 2 MSN LinuxMichi
+33/6/61925193 77694 Kehl/Germany IRC #Debian (irc.icq.com)


signature.pgp
Description: Digital signature


Re: Annoying auto_whitelist

2009-07-04 Thread Michelle Konzack
Goog evening Jari,

Am 2009-07-04 13:46:45, schrieb Jari Fredriksson:
> http://wiki.apache.org/spamassassin/BetterDocumentation/SqlReadmeAwl

Thankyou for the link, but if I understand  it  right,  spamassassin  is
then using ONE Database/Table for ALL users...  This mean, the  Database
will grow more then 10.000 ROW's a day...

Is in spamassassin something like an autoexpire?

Most spams I get are with UNIQUE From: header.  I allready collect  this
infos using procmail recipes...  And since 2002 I have  collectedt  over
27 million different E-Mails

Thanks, Greetings and nice Day/Evening
Michelle Konzack
Systemadministrator
Tamay Dogan Network
Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
Michelle Konzack   c/o Shared Office KabelBW  ICQ #328449886
+49/177/9351947Blumenstasse 2 MSN LinuxMichi
+33/6/61925193 77694 Kehl/Germany IRC #Debian (irc.icq.com)


signature.pgp
Description: Digital signature


Re: Annoying auto_whitelist

2009-07-04 Thread RW
On Sat, 04 Jul 2009 08:56:35 -0400
Matt Kettler  wrote:

> Please be aware the AWL is NOT whitelist, or a blacklist, and the
> scores don't really quite work the way they look. The AWL is
> essentially an averager, and as such, it's sometimes going to assign
> negative scores to spam sometimes.

And it works from its own version of the score that ignores
whitelisting and bayes scores. So if learning a spam leads to the next
spam from the same address getting a higher bayes score, that benefit
isn't washed-out by AWL. 


Re: Annoying auto_whitelist

2009-07-04 Thread Matt Kettler
Michelle Konzack wrote:
> Hello,
>
> while I get currently several 1000 shop/meds/pill/gen spams  a  day  and
> some are going throug my filters, I have to move them to  my  spamfolder
> manualy and feed them to "sa-learn --spam" but this does not work...
>
> ...because the Spamer From: is in the auto_whitelist.

Wait a second. The AWL has nothing to do with bayes or sa-learn.

The only reason SA won't learn a message a spam would be if it has
already been learned as spam, as noted in the "bayes_seen" database (or
corresponding SQL table).

> For me, this seems to be a bug, becuase sa-learn has to remove the From:
> from the auto_whitelist and then RESCAN this crap.
Um, the AWL has nothing to do with sa-learn --spam, and this action will
neither consult, nor modify the AWL.

What makes you think the AWL is inhibiting learning?

The AWL is actually going to contain *EVERY* sender that ever sent you
email (because it is an averager, not a whitelist), so if it would
inhibit learning, you'd never be able to learn anything.



Re: Annoying auto_whitelist

2009-07-04 Thread Matt Kettler
Michelle Konzack wrote:
> Hello,
>
> while I get currently several 1000 shop/meds/pill/gen spams  a  day  and
> some are going throug my filters, I have to move them to  my  spamfolder
> manualy and feed them to "sa-learn --spam" but this does not work...
>
> ...because the Spamer From: is in the auto_whitelist.
>
> For me, this seems to be a bug, becuase sa-learn has to remove the From:
> from the auto_whitelist and then RESCAN this crap.

Is the AWL actually causing false negatives?

Please be aware the AWL is NOT whitelist, or a blacklist, and the scores
don't really quite work the way they look. The AWL is essentially an
averager, and as such, it's sometimes going to assign negative scores to
spam sometimes.

This does *NOT* necessarily mean the AWL has "whitelisted" the sender,
unless it pushes it below the required_score. It just means that this
spam scored higher than the last one. i.e.: if a spam scoring +20 gets a
-5 AWL, the AWL still believes the sender is a spammer with a +10
average. If that same sender had instead sent a message scoring 0, the
AWL would have given them a +5.

Please be sure to read:

http://wiki.apache.org/spamassassin/AwlWrongWay

Before you make too many judgments about what the AWL is doing. Looking
at the score it assigns alone does not tell you anything about what the
AWL is doing.






Re: Annoying auto_whitelist

2009-07-04 Thread RW
On Sat, 4 Jul 2009 10:20:06 +0200
Michelle Konzack  wrote:

> Hello,
> 
> while I get currently several 1000 shop/meds/pill/gen spams  a  day
> and some are going throug my filters, I have to move them to  my
> spamfolder manualy and feed them to "sa-learn --spam" but this does
> not work...
> 
> ...because the Spamer From: is in the auto_whitelist.
> 
> For me, this seems to be a bug, becuase sa-learn has to remove the
> From: from the auto_whitelist and then RESCAN this crap.

So what happens if you don't remove it, what error do you get when you
run sa-learn?


Re: Annoying auto_whitelist

2009-07-04 Thread Jari Fredriksson
> Am 2009-07-04 11:53:27, schrieb Jari Fredriksson:
>> Do You have SQL based AWL? If not, it might  be worth a consideration,
>> given your amounts of email.
>
> AWL in SQL?
>
> Yes, I have a PostgreSQL database available (mean, each user  has  one),
> but how can I setup spamassassin to use it?
>
http://wiki.apache.org/spamassassin/BetterDocumentation/SqlReadmeAwl



Re: Annoying auto_whitelist

2009-07-04 Thread Henrik K
On Sat, Jul 04, 2009 at 11:53:27AM +0300, Jari Fredriksson wrote:
> > Hello,
> >
> > while I get currently several 1000 shop/meds/pill/gen spams  a  day  and
> > some are going throug my filters, I have to move them to  my  spamfolder
> > manualy and feed them to "sa-learn --spam" but this does not work...
> >
> > ...because the Spamer From: is in the auto_whitelist.
> >
> > For me, this seems to be a bug, becuase sa-learn has to remove the From:
> > from the auto_whitelist and then RESCAN this crap.
> >
> > the two last days I have uncompressed the spamarchives from the last  27
> > weeks (from this year), used "formail"  to  extract  all  From:  E-Mails
> > unified them and used
> >
> > for FROM in ${LIST} ; do
> > spamassassin --remove--addr-from-whitelist=${FROM}
> > done
> >
> > which took over 52 hours for 487000 EMails.  Hell, I have a  super  fast
> > machine with 15000 RpM SCSI drives and 32 GByte of memory.  This are 2.6
> > E-Mails per second...

You are loading a big perl program for every single email, what do you
expect? ;)

You should edit the database directly. If not using SQL, it's a bit more
trickier.. could modify trim_whitelist to do it etc..

> Do You have SQL based AWL? If not, it might  be worth a consideration,
> given your amounts of email.
> 
> With SQL
> 
>  for FROM in ${LIST} ; do
>  mysql -u spamassassin -psecret spamassassin <  delete from awl where email='${FROM}' ;
>  EOF
>  done
> 
> Should be MUCH faster.

It's possible that $FROM may contain quote characters, so it should be
handled. It's always a good practise, even though I doubt any emails contain
SQL injections..

Also you could just output all sql clauses into a file first and then run
it. To avoid the same pitfall as above, though in a smaller scale. ;)



Re: Annoying auto_whitelist

2009-07-04 Thread Michelle Konzack
Am 2009-07-04 11:53:27, schrieb Jari Fredriksson:
> Do You have SQL based AWL? If not, it might  be worth a consideration,
> given your amounts of email.

AWL in SQL?

Yes, I have a PostgreSQL database available (mean, each user  has  one),
but how can I setup spamassassin to use it?

> With SQL
> 
>  for FROM in ${LIST} ; do
>  mysql -u spamassassin -psecret spamassassin <  delete from awl where email='${FROM}' ;
>  EOF
>  done
> 
> Should be MUCH faster.

Like to try it out, but how to setup?

Thanks, Greetings and nice Day/Evening
Michelle Konzack
Systemadministrator
Tamay Dogan Network
Debian GNU/Linux Consultant

-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
 Michelle Konzack
   c/o Vertriebsp. KabelBW
   Blumenstrasse 2
Jabber linux4miche...@jabber.ccc.de   77694 Kehl/Germany
IRC #Debian (irc.icq.com) Tel. DE: +49 177 9351947
ICQ #328449886Tel. FR: +33  6  61925193


signature.pgp
Description: Digital signature


Re: Annoying auto_whitelist

2009-07-04 Thread Jari Fredriksson
> Hello,
>
> while I get currently several 1000 shop/meds/pill/gen spams  a  day  and
> some are going throug my filters, I have to move them to  my  spamfolder
> manualy and feed them to "sa-learn --spam" but this does not work...
>
> ...because the Spamer From: is in the auto_whitelist.
>
> For me, this seems to be a bug, becuase sa-learn has to remove the From:
> from the auto_whitelist and then RESCAN this crap.
>
> the two last days I have uncompressed the spamarchives from the last  27
> weeks (from this year), used "formail"  to  extract  all  From:  E-Mails
> unified them and used
>
> for FROM in ${LIST} ; do
> spamassassin --remove--addr-from-whitelist=${FROM}
> done
>
> which took over 52 hours for 487000 EMails.  Hell, I have a  super  fast
> machine with 15000 RpM SCSI drives and 32 GByte of memory.  This are 2.6
> E-Mails per second...

Do You have SQL based AWL? If not, it might  be worth a consideration,
given your amounts of email.

With SQL

 for FROM in ${LIST} ; do
 mysql -u spamassassin -psecret spamassassin <

Annoying auto_whitelist

2009-07-04 Thread Michelle Konzack
Hello,

while I get currently several 1000 shop/meds/pill/gen spams  a  day  and
some are going throug my filters, I have to move them to  my  spamfolder
manualy and feed them to "sa-learn --spam" but this does not work...

...because the Spamer From: is in the auto_whitelist.

For me, this seems to be a bug, becuase sa-learn has to remove the From:
from the auto_whitelist and then RESCAN this crap.

the two last days I have uncompressed the spamarchives from the last  27
weeks (from this year), used "formail"  to  extract  all  From:  E-Mails
unified them and used

for FROM in ${LIST} ; do
spamassassin --remove--addr-from-whitelist=${FROM}
done

which took over 52 hours for 487000 EMails.  Hell, I have a  super  fast
machine with 15000 RpM SCSI drives and 32 GByte of memory.  This are 2.6
E-Mails per second...

Why is this so slow?

On my Interanet Server NEC 4500MH  (Quad-Xeon,  550MHz/4GByte)  it  take
arround 5-11 seconds for a singel E-Mail to remove.

michelle.konz...@vserver1:~$ apt-cache policy spamassassin
spamassassin:
  Installiert: 3.2.5-2
  Kandidat: 3.2.5-2
  Versions-Tabelle:
 *** 3.2.5-2 0
500 http://ftp.de.debian.org lenny/main Packages
100 /var/lib/dpkg/status


Thanks, Greetings and nice Day/Evening
Michelle Konzack
Systemadministrator
25.9V Electronic Engineer
Tamay Dogan Network
Debian GNU/Linux Consultant

-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
 Michelle Konzack
   c/o Vertriebsp. KabelBW
   Blumenstrasse 2
Jabber linux4miche...@jabber.ccc.de   77694 Kehl/Germany
IRC #Debian (irc.icq.com) Tel. DE: +49 177 9351947
ICQ #328449886Tel. FR: +33  6  61925193


signature.pgp
Description: Digital signature