Re: Question on add-to-blacklist

2009-06-08 Thread Larry Starr
Just wanted to pass along a thank you to those who helped out here and provide 
a few notes, on my experience, that may help anyone else that is looking at 
this.

By the way, converting to MySQL did alleviate the problems that I was seeing 
when attempting to apply updates to AWL ... processes that were taking hours 
are now taking seconds!

Along the way I discovered a few things:
1)  We have been running SA for several years and the AWL was far too large 
for any of the scripts, that I downloaded for conversion.
2)  The time to load the MySQL database, with per-row insert logic was 
daunting.

Note:  I made a few passes at this before hitting on a working solution.

I found it, much, faster, to unload the existing DB to a delimited file, and 
use mysql load to load the AWL from that file.

What I wound up with was a scrit to:
. Unload the DB (formatting a flat delimited file):
  This was created from the convert_awl_dbm_to_sql script downloaded 
  from the spamassassin/tools.  Substituting output, to stdout, for the 
  mysql insert statement, and modifying the loop to use each rather than 
  attempting to build and array of @k as well as adding some trim 
  functionality:

# my @k = grep(!/totscore$/,keys(%h));
# for my $key (@k) {
our $rowsread;
our $rowsinsert;
print stderr Processing: $db \n;
while(my ($key, $v) = each %h)
{
  next if $key =~ /totscore$/; # Skip totscore entries 
  $rowsread++;
  next if ($v  2); # skip one off entries.
  .
  .
  .
  $rowsinsert++;
  print '',$opt{'username'},',','',$email,',','',$ip,',',
$count,',',$totscore,\n;
  .
  .
  .

. Drop and create the AWL table (the drop becuase of earlier experiments).
drop table awl;
CREATE TABLE awl (
  username varchar(100) NOT NULL default '',
  email varchar(200) NOT NULL default '',
  ip varchar(10) NOT NULL default '',
  count int(11) default '0',
  totscore float default '0',
  lastupdate timestamp(14) NOT NULL DEFAULT CURRENT_TIMESTAMP ON
  UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (username,email,ip)
) TYPE=MyISAM;

. load the MySQL database:
LOAD DATA INFILE '/home/larrys/sa_tools/awl.load' 
REPLACE 
INTO TABLE awl 
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '' 
(username, email, ip, count, totscore) 
SET lastupdate = CURRENT_TIMESTAMP;

Following this it was a simple matter of sliding the new local.cf into place 
and restarting processesses.

So far everything seems to be running well.

Thanks again
-- 
Larry G. Starr - lar...@fullcompass.com or sta...@globaldialog.com
Software Engineer: Full Compass Systems LTD.
Phone: 608-831-7330 x 1347  FAX: 608-831-6330
===
There are only three sports: bullfighting, mountaineering and motor
racing, all the rest are merely games! - Ernest Hemmingway


Re: Question on add-to-blacklist

2009-06-05 Thread Larry Starr
On Wednesday 03 June 2009, Jari Fredriksson wrote:
  On Tuesday 02 June 2009, Michael Scheidell wrote:
  What optional fields are you refering to?
 
  I have seen this, on the spamassassin WIKI:
 
  CREATE TABLE awl (
   username varchar(100) NOT NULL default '',
   email varchar(200) NOT NULL default '',
   ip varchar(10) NOT NULL default '',
   count int(11) default '0',
   totscore float default '0',
   PRIMARY KEY  (username,email,ip)
  ) TYPE=MyISAM;
 
  Is there a better reference?

  CREATE TABLE `awl` (
   `username` varchar(100) NOT NULL DEFAULT '',
   `email` varchar(200) NOT NULL DEFAULT '',
   `ip` varchar(10) NOT NULL DEFAULT '',
   `count` int(11) DEFAULT '0',
   `totscore` float DEFAULT '0',
   `lastupdate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE
 CURRENT_TIMESTAMP, PRIMARY KEY (`username`,`email`,`ip`)
 ) ENGINE=InnoDB ;

Thanks, I can see that the 'lastupdate' field would prove handy over time.

I'm noticing that most of the scripts, that I've found for processing AWL 
data, including 'convert_awl_dbm_to_sql' will not run, for me, as written.

They seem to, invariably, use tie %hash and then use something like:

 my @k = grep(!/totscore$/,keys(%h));

to build a list of keys.   My existing AWL has something over 10Million 
records, and my system only contains 4G of RAM.

I've been rewriting the loops in the scripts that I'm trying to use from:
  my @k = grep(!/totscore$/,keys(%h));
  for my $key (@k) {

to:
  while(my ($key, $v) = each %h)

This may be, a bit slower, but scales better to large data volumes.

-- 
Larry G. Starr - lar...@fullcompass.com or sta...@globaldialog.com
Software Engineer: Full Compass Systems LTD.
Phone: 608-831-7330 x 1347  FAX: 608-831-6330
===
There are only three sports: bullfighting, mountaineering and motor
racing, all the rest are merely games! - Ernest Hemmingway


Re: Question on add-to-blacklist

2009-06-04 Thread Matus UHLAR - fantomas
 On 3-Jun-2009, at 14:02, Jari Fredriksson wrote:
  `ip` varchar(10) NOT NULL DEFAULT '',

On 03.06.09 17:48, LuKreme wrote:
 10?

7 could be enough for now, afaik AWL only stores /16 prefix...

PostgreSQL has a IPv4 type btw

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Honk if you love peace and quiet. 


Re: Question on add-to-blacklist

2009-06-04 Thread Mike Cardwell

LuKreme wrote:


 `ip` varchar(10) NOT NULL DEFAULT '',


10?


I'm missing some of the context here, but usually if someone is storing 
an ip in 10 characters it's because they're storing the ip number rather 
than the ip address.


r...@haven:~# perl -e 'print length(256*256*256*256).\n;'
10
r...@haven:~#

Still, if you were doing that, you'd want to use an integer rather than 
a varchar preferably.


--
Mike Cardwell - IT Consultant and LAMP developer
Cardwell IT Ltd. (UK Reg'd Company #06920226) http://cardwellit.com/


Re: Question on add-to-blacklist

2009-06-04 Thread Jari Fredriksson
 On 3-Jun-2009, at 14:02, Jari Fredriksson wrote:
  `ip` varchar(10) NOT NULL DEFAULT '',
 
 
 10?

It's on wiki

http://wiki.apache.org/spamassassin/BetterDocumentation/SqlReadmeAwl?highlight=%28awl%29%7C%28sql%29




Re: Question on add-to-blacklist

2009-06-03 Thread Larry Starr
On Tuesday 02 June 2009, Adam Katz wrote:
 Larry Starr lar...@fullcompass.com wrote:
  I have been using the AWL ( --add-addr-to-blacklist ) for some
  time, to bump new spam senders above the Bayes-99 score.

 Theo Van Dinter responded:
  Well, the first problem is that the AWL has no impact on Bayes.
  They're totally independent.
  Perhaps you want sa-learn ?

 I believe Larry, like myself, trains both Bayes and AWL (I also
 report, and my SpamCop account copies KnujOn since bug 6085 is yet to
 be addressed).  The AWL bump gives me a head start on that senders'
 future spams since they obviously beat SA in the past ... without that
 bump, AWL would actually be subtracting points!

 Note, I just started this practice, and the recent 83MB
 auto-whitelist? thread's trim_whitelist script looks like it will
 help me tremendously.

You are correct, I use it as a temporary bump to score spam higher, which will 
come down if I receive any volume of HAM from the sender, or take care of 
itself as the sender is added to various DNSBL's.

The AWL is used for those messages that have been fed to Bayes, and still 
score below a threshold of 6.0 (an arbritrary number to optomize the process.


-- 
Larry G. Starr - lar...@fullcompass.com or sta...@globaldialog.com
Software Engineer: Full Compass Systems LTD.
Phone: 608-831-7330 x 1347  FAX: 608-831-6330
===
There are only three sports: bullfighting, mountaineering and motor
racing, all the rest are merely games! - Ernest Hemmingway


Re: Question on add-to-blacklist

2009-06-03 Thread Larry Starr
On Tuesday 02 June 2009, Michael Scheidell wrote:
  I have been using the AWL ( --add-addr-to-blacklist ) for some time, to
  bump new spam senders above the Bayes-99 score.
 
  My problem is that this feature seems, extreemly slow.
 
  I'm now trying to use the ( --add-to-blacklist ) option and am finding
  that this is, equally, slow.
 
  I'm running it as:
  spamassassin  -d --progress --add-to-blacklist --mbox mboxfile
 
  The mboxfile contains the messages whose senders I wish to blacklist,
  via AWL.
 
  The process seems to take anywhere from 5 to 15 minutes, per address.

 Most likely, you are still using the flat file (db2 , 3 or 4).  If you
 convert to MYSQL based file, and add in some of the optional fields and
 trim it weekly, it should be faster

Yes, I am still using the Flat file (default) DB.
I have been toying with setting it up in MySQL, but have not bitten that 
bullet yet.

What optional fields are you refering to?  

I have seen this, on the spamassassin WIKI:

CREATE TABLE awl (
  username varchar(100) NOT NULL default '',
  email varchar(200) NOT NULL default '',
  ip varchar(10) NOT NULL default '',
  count int(11) default '0',
  totscore float default '0',
  PRIMARY KEY  (username,email,ip)
) TYPE=MyISAM;

Is there a better reference?
-- 
Larry G. Starr - lar...@fullcompass.com or sta...@globaldialog.com
Software Engineer: Full Compass Systems LTD.
Phone: 608-831-7330 x 1347  FAX: 608-831-6330
===
There are only three sports: bullfighting, mountaineering and motor
racing, all the rest are merely games! - Ernest Hemmingway


Re: Question on add-to-blacklist

2009-06-03 Thread Jari Fredriksson
 On Tuesday 02 June 2009, Michael Scheidell wrote:
 What optional fields are you refering to?
 
 I have seen this, on the spamassassin WIKI:
 
 CREATE TABLE awl (
  username varchar(100) NOT NULL default '',
  email varchar(200) NOT NULL default '',
  ip varchar(10) NOT NULL default '',
  count int(11) default '0',
  totscore float default '0',
  PRIMARY KEY  (username,email,ip)
 ) TYPE=MyISAM;
 
 Is there a better reference?

 CREATE TABLE `awl` (
  `username` varchar(100) NOT NULL DEFAULT '',
  `email` varchar(200) NOT NULL DEFAULT '',
  `ip` varchar(10) NOT NULL DEFAULT '',
  `count` int(11) DEFAULT '0',
  `totscore` float DEFAULT '0',
  `lastupdate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE 
CURRENT_TIMESTAMP,
  PRIMARY KEY (`username`,`email`,`ip`)
) ENGINE=InnoDB ;



Re: Question on add-to-blacklist

2009-06-03 Thread LuKreme

On 3-Jun-2009, at 14:02, Jari Fredriksson wrote:

 `ip` varchar(10) NOT NULL DEFAULT '',



10?

--
There is NO Rule six!



Question on add-to-blacklist

2009-06-02 Thread Larry Starr
I have been using the AWL ( --add-addr-to-blacklist ) for some time, to bump 
new spam senders above the Bayes-99 score.

My problem is that this feature seems, extreemly slow.

I'm now trying to use the ( --add-to-blacklist ) option and am finding that 
this is, equally, slow.

I'm running it as:
spamassassin  -d --progress --add-to-blacklist --mbox mboxfile

The mboxfile contains the messages whose senders I wish to blacklist, via 
AWL.

The process seems to take anywhere from 5 to 15 minutes, per address.

Can anyone offer a faster alternative?

Thanks,
-- 
Larry G. Starr - lar...@fullcompass.com or sta...@globaldialog.com
Software Engineer: Full Compass Systems LTD.
Phone: 608-831-7330 x 1347  FAX: 608-831-6330
===
There are only three sports: bullfighting, mountaineering and motor
racing, all the rest are merely games! - Ernest Hemmingway


Re: Question on add-to-blacklist

2009-06-02 Thread Michael Scheidell
 I have been using the AWL ( --add-addr-to-blacklist ) for some time, to bump
 new spam senders above the Bayes-99 score.
 
 My problem is that this feature seems, extreemly slow.
 
 I'm now trying to use the ( --add-to-blacklist ) option and am finding that
 this is, equally, slow.
 
 I'm running it as:
 spamassassin  -d --progress --add-to-blacklist --mbox mboxfile
 
 The mboxfile contains the messages whose senders I wish to blacklist, via
 AWL.
 
 The process seems to take anywhere from 5 to 15 minutes, per address.
 
Most likely, you are still using the flat file (db2 , 3 or 4).  If you
convert to MYSQL based file, and add in some of the optional fields and trim
it weekly, it should be faster
-- 
Michael Scheidell, CTO
|SECNAP Network Security
Finalist 2009 Network Products Guide Hot Companies
FreeBSD SpamAssassin Ports maintainer


_
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
_


Re: Question on add-to-blacklist

2009-06-02 Thread Theo Van Dinter
Well, the first problem is that the AWL has no impact on Bayes.
They're totally independent.
Perhaps you want sa-learn ?

On Tue, Jun 2, 2009 at 2:32 PM, Larry Starr lar...@fullcompass.com wrote:
 I have been using the AWL ( --add-addr-to-blacklist ) for some time, to bump
 new spam senders above the Bayes-99 score.

 My problem is that this feature seems, extreemly slow.

 I'm now trying to use the ( --add-to-blacklist ) option and am finding that
 this is, equally, slow.

 I'm running it as:
 spamassassin  -d --progress --add-to-blacklist --mbox mboxfile

 The mboxfile contains the messages whose senders I wish to blacklist, via
 AWL.

 The process seems to take anywhere from 5 to 15 minutes, per address.

 Can anyone offer a faster alternative?

 Thanks,
 --
 Larry G. Starr - lar...@fullcompass.com or sta...@globaldialog.com
 Software Engineer: Full Compass Systems LTD.
 Phone: 608-831-7330 x 1347  FAX: 608-831-6330
 ===
 There are only three sports: bullfighting, mountaineering and motor
 racing, all the rest are merely games! - Ernest Hemmingway



Re: Question on add-to-blacklist

2009-06-02 Thread Adam Katz
Larry Starr lar...@fullcompass.com wrote:
 I have been using the AWL ( --add-addr-to-blacklist ) for some 
 time, to bump new spam senders above the Bayes-99 score.

Theo Van Dinter responded:
 Well, the first problem is that the AWL has no impact on Bayes. 
 They're totally independent.
 Perhaps you want sa-learn ?

I believe Larry, like myself, trains both Bayes and AWL (I also
report, and my SpamCop account copies KnujOn since bug 6085 is yet to
be addressed).  The AWL bump gives me a head start on that senders'
future spams since they obviously beat SA in the past ... without that
bump, AWL would actually be subtracting points!

Note, I just started this practice, and the recent 83MB
auto-whitelist? thread's trim_whitelist script looks like it will
help me tremendously.