Re: Question on add-to-blacklist
Just wanted to pass along a thank you to those who helped out here and provide a few notes, on my experience, that may help anyone else that is looking at this. By the way, converting to MySQL did alleviate the problems that I was seeing when attempting to apply updates to AWL ... processes that were taking hours are now taking seconds! Along the way I discovered a few things: 1) We have been running SA for several years and the AWL was far too large for any of the scripts, that I downloaded for conversion. 2) The time to load the MySQL database, with per-row insert logic was daunting. Note: I made a few passes at this before hitting on a working solution. I found it, much, faster, to unload the existing DB to a delimited file, and use mysql load to load the AWL from that file. What I wound up with was a scrit to: . Unload the DB (formatting a flat delimited file): This was created from the convert_awl_dbm_to_sql script downloaded from the spamassassin/tools. Substituting output, to stdout, for the mysql insert statement, and modifying the loop to use each rather than attempting to build and array of @k as well as adding some trim functionality: # my @k = grep(!/totscore$/,keys(%h)); # for my $key (@k) { our $rowsread; our $rowsinsert; print stderr Processing: $db \n; while(my ($key, $v) = each %h) { next if $key =~ /totscore$/; # Skip totscore entries $rowsread++; next if ($v 2); # skip one off entries. . . . $rowsinsert++; print '',$opt{'username'},',','',$email,',','',$ip,',', $count,',',$totscore,\n; . . . . Drop and create the AWL table (the drop becuase of earlier experiments). drop table awl; CREATE TABLE awl ( username varchar(100) NOT NULL default '', email varchar(200) NOT NULL default '', ip varchar(10) NOT NULL default '', count int(11) default '0', totscore float default '0', lastupdate timestamp(14) NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (username,email,ip) ) TYPE=MyISAM; . load the MySQL database: LOAD DATA INFILE '/home/larrys/sa_tools/awl.load' REPLACE INTO TABLE awl FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '' (username, email, ip, count, totscore) SET lastupdate = CURRENT_TIMESTAMP; Following this it was a simple matter of sliding the new local.cf into place and restarting processesses. So far everything seems to be running well. Thanks again -- Larry G. Starr - lar...@fullcompass.com or sta...@globaldialog.com Software Engineer: Full Compass Systems LTD. Phone: 608-831-7330 x 1347 FAX: 608-831-6330 === There are only three sports: bullfighting, mountaineering and motor racing, all the rest are merely games! - Ernest Hemmingway
Re: Question on add-to-blacklist
On Wednesday 03 June 2009, Jari Fredriksson wrote: On Tuesday 02 June 2009, Michael Scheidell wrote: What optional fields are you refering to? I have seen this, on the spamassassin WIKI: CREATE TABLE awl ( username varchar(100) NOT NULL default '', email varchar(200) NOT NULL default '', ip varchar(10) NOT NULL default '', count int(11) default '0', totscore float default '0', PRIMARY KEY (username,email,ip) ) TYPE=MyISAM; Is there a better reference? CREATE TABLE `awl` ( `username` varchar(100) NOT NULL DEFAULT '', `email` varchar(200) NOT NULL DEFAULT '', `ip` varchar(10) NOT NULL DEFAULT '', `count` int(11) DEFAULT '0', `totscore` float DEFAULT '0', `lastupdate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`username`,`email`,`ip`) ) ENGINE=InnoDB ; Thanks, I can see that the 'lastupdate' field would prove handy over time. I'm noticing that most of the scripts, that I've found for processing AWL data, including 'convert_awl_dbm_to_sql' will not run, for me, as written. They seem to, invariably, use tie %hash and then use something like: my @k = grep(!/totscore$/,keys(%h)); to build a list of keys. My existing AWL has something over 10Million records, and my system only contains 4G of RAM. I've been rewriting the loops in the scripts that I'm trying to use from: my @k = grep(!/totscore$/,keys(%h)); for my $key (@k) { to: while(my ($key, $v) = each %h) This may be, a bit slower, but scales better to large data volumes. -- Larry G. Starr - lar...@fullcompass.com or sta...@globaldialog.com Software Engineer: Full Compass Systems LTD. Phone: 608-831-7330 x 1347 FAX: 608-831-6330 === There are only three sports: bullfighting, mountaineering and motor racing, all the rest are merely games! - Ernest Hemmingway
Re: Question on add-to-blacklist
On 3-Jun-2009, at 14:02, Jari Fredriksson wrote: `ip` varchar(10) NOT NULL DEFAULT '', On 03.06.09 17:48, LuKreme wrote: 10? 7 could be enough for now, afaik AWL only stores /16 prefix... PostgreSQL has a IPv4 type btw -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Honk if you love peace and quiet.
Re: Question on add-to-blacklist
LuKreme wrote: `ip` varchar(10) NOT NULL DEFAULT '', 10? I'm missing some of the context here, but usually if someone is storing an ip in 10 characters it's because they're storing the ip number rather than the ip address. r...@haven:~# perl -e 'print length(256*256*256*256).\n;' 10 r...@haven:~# Still, if you were doing that, you'd want to use an integer rather than a varchar preferably. -- Mike Cardwell - IT Consultant and LAMP developer Cardwell IT Ltd. (UK Reg'd Company #06920226) http://cardwellit.com/
Re: Question on add-to-blacklist
On 3-Jun-2009, at 14:02, Jari Fredriksson wrote: `ip` varchar(10) NOT NULL DEFAULT '', 10? It's on wiki http://wiki.apache.org/spamassassin/BetterDocumentation/SqlReadmeAwl?highlight=%28awl%29%7C%28sql%29
Re: Question on add-to-blacklist
On Tuesday 02 June 2009, Adam Katz wrote: Larry Starr lar...@fullcompass.com wrote: I have been using the AWL ( --add-addr-to-blacklist ) for some time, to bump new spam senders above the Bayes-99 score. Theo Van Dinter responded: Well, the first problem is that the AWL has no impact on Bayes. They're totally independent. Perhaps you want sa-learn ? I believe Larry, like myself, trains both Bayes and AWL (I also report, and my SpamCop account copies KnujOn since bug 6085 is yet to be addressed). The AWL bump gives me a head start on that senders' future spams since they obviously beat SA in the past ... without that bump, AWL would actually be subtracting points! Note, I just started this practice, and the recent 83MB auto-whitelist? thread's trim_whitelist script looks like it will help me tremendously. You are correct, I use it as a temporary bump to score spam higher, which will come down if I receive any volume of HAM from the sender, or take care of itself as the sender is added to various DNSBL's. The AWL is used for those messages that have been fed to Bayes, and still score below a threshold of 6.0 (an arbritrary number to optomize the process. -- Larry G. Starr - lar...@fullcompass.com or sta...@globaldialog.com Software Engineer: Full Compass Systems LTD. Phone: 608-831-7330 x 1347 FAX: 608-831-6330 === There are only three sports: bullfighting, mountaineering and motor racing, all the rest are merely games! - Ernest Hemmingway
Re: Question on add-to-blacklist
On Tuesday 02 June 2009, Michael Scheidell wrote: I have been using the AWL ( --add-addr-to-blacklist ) for some time, to bump new spam senders above the Bayes-99 score. My problem is that this feature seems, extreemly slow. I'm now trying to use the ( --add-to-blacklist ) option and am finding that this is, equally, slow. I'm running it as: spamassassin -d --progress --add-to-blacklist --mbox mboxfile The mboxfile contains the messages whose senders I wish to blacklist, via AWL. The process seems to take anywhere from 5 to 15 minutes, per address. Most likely, you are still using the flat file (db2 , 3 or 4). If you convert to MYSQL based file, and add in some of the optional fields and trim it weekly, it should be faster Yes, I am still using the Flat file (default) DB. I have been toying with setting it up in MySQL, but have not bitten that bullet yet. What optional fields are you refering to? I have seen this, on the spamassassin WIKI: CREATE TABLE awl ( username varchar(100) NOT NULL default '', email varchar(200) NOT NULL default '', ip varchar(10) NOT NULL default '', count int(11) default '0', totscore float default '0', PRIMARY KEY (username,email,ip) ) TYPE=MyISAM; Is there a better reference? -- Larry G. Starr - lar...@fullcompass.com or sta...@globaldialog.com Software Engineer: Full Compass Systems LTD. Phone: 608-831-7330 x 1347 FAX: 608-831-6330 === There are only three sports: bullfighting, mountaineering and motor racing, all the rest are merely games! - Ernest Hemmingway
Re: Question on add-to-blacklist
On Tuesday 02 June 2009, Michael Scheidell wrote: What optional fields are you refering to? I have seen this, on the spamassassin WIKI: CREATE TABLE awl ( username varchar(100) NOT NULL default '', email varchar(200) NOT NULL default '', ip varchar(10) NOT NULL default '', count int(11) default '0', totscore float default '0', PRIMARY KEY (username,email,ip) ) TYPE=MyISAM; Is there a better reference? CREATE TABLE `awl` ( `username` varchar(100) NOT NULL DEFAULT '', `email` varchar(200) NOT NULL DEFAULT '', `ip` varchar(10) NOT NULL DEFAULT '', `count` int(11) DEFAULT '0', `totscore` float DEFAULT '0', `lastupdate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`username`,`email`,`ip`) ) ENGINE=InnoDB ;
Re: Question on add-to-blacklist
On 3-Jun-2009, at 14:02, Jari Fredriksson wrote: `ip` varchar(10) NOT NULL DEFAULT '', 10? -- There is NO Rule six!
Question on add-to-blacklist
I have been using the AWL ( --add-addr-to-blacklist ) for some time, to bump new spam senders above the Bayes-99 score. My problem is that this feature seems, extreemly slow. I'm now trying to use the ( --add-to-blacklist ) option and am finding that this is, equally, slow. I'm running it as: spamassassin -d --progress --add-to-blacklist --mbox mboxfile The mboxfile contains the messages whose senders I wish to blacklist, via AWL. The process seems to take anywhere from 5 to 15 minutes, per address. Can anyone offer a faster alternative? Thanks, -- Larry G. Starr - lar...@fullcompass.com or sta...@globaldialog.com Software Engineer: Full Compass Systems LTD. Phone: 608-831-7330 x 1347 FAX: 608-831-6330 === There are only three sports: bullfighting, mountaineering and motor racing, all the rest are merely games! - Ernest Hemmingway
Re: Question on add-to-blacklist
I have been using the AWL ( --add-addr-to-blacklist ) for some time, to bump new spam senders above the Bayes-99 score. My problem is that this feature seems, extreemly slow. I'm now trying to use the ( --add-to-blacklist ) option and am finding that this is, equally, slow. I'm running it as: spamassassin -d --progress --add-to-blacklist --mbox mboxfile The mboxfile contains the messages whose senders I wish to blacklist, via AWL. The process seems to take anywhere from 5 to 15 minutes, per address. Most likely, you are still using the flat file (db2 , 3 or 4). If you convert to MYSQL based file, and add in some of the optional fields and trim it weekly, it should be faster -- Michael Scheidell, CTO |SECNAP Network Security Finalist 2009 Network Products Guide Hot Companies FreeBSD SpamAssassin Ports maintainer _ This email has been scanned and certified safe by SpammerTrap(r). For Information please see http://www.secnap.com/products/spammertrap/ _
Re: Question on add-to-blacklist
Well, the first problem is that the AWL has no impact on Bayes. They're totally independent. Perhaps you want sa-learn ? On Tue, Jun 2, 2009 at 2:32 PM, Larry Starr lar...@fullcompass.com wrote: I have been using the AWL ( --add-addr-to-blacklist ) for some time, to bump new spam senders above the Bayes-99 score. My problem is that this feature seems, extreemly slow. I'm now trying to use the ( --add-to-blacklist ) option and am finding that this is, equally, slow. I'm running it as: spamassassin -d --progress --add-to-blacklist --mbox mboxfile The mboxfile contains the messages whose senders I wish to blacklist, via AWL. The process seems to take anywhere from 5 to 15 minutes, per address. Can anyone offer a faster alternative? Thanks, -- Larry G. Starr - lar...@fullcompass.com or sta...@globaldialog.com Software Engineer: Full Compass Systems LTD. Phone: 608-831-7330 x 1347 FAX: 608-831-6330 === There are only three sports: bullfighting, mountaineering and motor racing, all the rest are merely games! - Ernest Hemmingway
Re: Question on add-to-blacklist
Larry Starr lar...@fullcompass.com wrote: I have been using the AWL ( --add-addr-to-blacklist ) for some time, to bump new spam senders above the Bayes-99 score. Theo Van Dinter responded: Well, the first problem is that the AWL has no impact on Bayes. They're totally independent. Perhaps you want sa-learn ? I believe Larry, like myself, trains both Bayes and AWL (I also report, and my SpamCop account copies KnujOn since bug 6085 is yet to be addressed). The AWL bump gives me a head start on that senders' future spams since they obviously beat SA in the past ... without that bump, AWL would actually be subtracting points! Note, I just started this practice, and the recent 83MB auto-whitelist? thread's trim_whitelist script looks like it will help me tremendously.