Re: The googolbees are getting craftier
On Mon, 21 Jan 2008, John D. Hardin wrote: m,https?://(?:[^\./]+\.)*goo+gle(?:pages)?\.(?:[a-z][a-z][a-z]?(?:\.[a-z][a-z])?)/+.*[?](?:btni|adurl),i If I understand that pattern, both the '*' are 'unbounded'??? This might 'break' your spamfilter, if spamassassin gobbles up all memory during analysis. Better replace any unbounded '*' by reasonable length {0,N}, with N a little more than the seen strings. Stucki -- Christoph von Stuckrad * * |nickname |[EMAIL PROTECTED] \ Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-75 459| Mathematik Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600| Takustr. 9 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/
Re: The googolbees are getting craftier
On Tue, 2008-01-22 at 13:01 +0100, Chr. v. Stuckrad wrote: On Mon, 21 Jan 2008, John D. Hardin wrote: m,https?://(?:[^\./]+\.)*goo+gle(?:pages)?\.(?:[a-z][a-z][a-z]?(?:\.[a-z][a-z])?)/+.*[?](?:btni|adurl),i If I understand that pattern, both the '*' are 'unbounded'??? This might 'break' your spamfilter, if spamassassin gobbles up all memory during analysis. Better replace any unbounded '*' by reasonable length {0,N}, with N a little more than the seen strings. You've snipped the beginning of the rule definition. It's an uri rule, and thus the RE will be matched against identified URIs of the mail body only -- which by itself usually is rather bounded. :) guenther -- char *t=[EMAIL PROTECTED]; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Odd code at end of spam
Bank phish in our spam trap ends as follows. Is it just junk, or is it trying to do something? Joseph Brennan Columbia University Information Technology pfont face=Times New Roman, Times, serifThank you for banking with us!/font/p pfont face=Times New Roman, Times, serifHuntingtonNational Bank CustomerSupportbr/font/p pfont face=Times New Roman, Times, serif*/font/p pfont face=Times New Roman, Times, serifcopy; 2008 Huntington Bancshares Incorporated/font/p pfont color=#FD face=Times New Roman, Times, serif13LC: 0x550, 0x2117, 0x190, 0x0, 0x20851291, 0x24190140, 0x1YNAstack: 0x46987084, 0x570, 0x5, 0x674, 0x3, 0x5, 0x6, 0x99947115, 0x36603937, 0x395 0x7708, 0x78, 0x8679, 0x809, 0x190, 0x391 common: 0x97600652, 0x78246484, 0x42, 0x831, 0x002 stack: 0x46, 0x62, 0x850, 0x65716791, 0x16, 0x98, 0x9242, 0x9, 0x281, 0x81644476 0x8, 0x572, 0x72, 0x62, 0x31664848, 0x0620 F9E: 0x485, 0x637, 0x94, 0x8, 0x65011960, 0x9, 0x75, 0x9826, 0x394, 0x89, 0x56610819, 0x55687550, 0x301, 0x47 VM5: 0x4610, 0x79, 0x37, 0x7660, 0x129, 0x9928, 0x93, 0x870, 0x3, 0x00, 0x08883255/font/p pfont color=#F8 face=Times New Roman, Times, serifspanX9YC: 0x8, 0x0882root: 0x89202124, 0x1, 0x71 GDW2: 0x7878, 0x975, 0x74, 0x35, 0x31 1DH, interface, 69W, function, rev.0x92, 0x34640689 0x3, 0x45512349, 0x882, 0x04, 0x8636, 0x58198764, 0x16904327, 0x8447, 0x9, 0x98, 0x1007, 0x308, 0x15, 0x32, 0x75 726A STTZ 7VX EYYR BVP4 OSX 0x83357572, 0x832, 0x3349, 0x3867, 0x789, 0x57, 0x451 5VH: 0x114, 0x94968150, 0x7054 media: 0x2582, 0x2525, 0x69557963, 0x845, 0x478, 0x608, 0x036, 0x77, 0x4, 0x33, 0x163, 0x002, 0x76, 0x916/span/font/p pfont color=#FA face=Times New Roman, Times, serifspan0x87093522, 0x8, 0x6, 0x5588, 0x99453372, 0x881DQ: 0x661 engine: 0x19, 0x183, 0x5150, 0x823 0x393, 0x271, 0x55, 0x16218081, 0x61, 0x7 type include 54Q 5DPL C7V: 0x9744 update: 0x7, 0x23923010, 0x54445690, 0x7352 revision: 0x74203131, 0x0, 0x56, 0x7810, 0x462, 0x80, 0x076, 0x8580, 0x7372 rev, api, file, dec, cvs, IKHL, rev, close, includeinterface: 0x76652732, 0x7, 0x84104122, 0x706, 0x47116255, 0x9621, 0x5943, 0x2822, 0x92, 0x335, 0x269, 0x6/span/font/p /body /html
Re: more efficent big scoring
On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote: On Sat, 19 Jan 2008, Loren Wilton wrote: I would not be terribly surprised to find out that on average there was no appreciable difference in running all rules of all types in priority order, over the current method; Neither am I. Another thing to consider is the fraction of defined rules that actually hit and affect the score is rather small. The greatest optimization would be to not test REs you know will fail; but how do you do *that*? thanks for all the followups on my inquiry. I'm glad the topic is/was considered and it looks like there is some room for development, but I now realize it is not as simple as I thought it might have been. In answer to above question, maybe the tests need their own scoring? eg fast tests and with big spam scores get a higher test score than slow tests with low spam scores. maybe if there was some way to establish a hierachy at startup which groups rule processing into nodes. some nodes finish quickly, some have dependencies, some are negative, etc. Then utilize some sort of looping test (eg check every .5 second) which can kill remaining tests and short circut. eg anytime in the hierachy the score is above what the negative test can fix, etc. Appreciate the discussion thus far, unfortunately discussion is all I'm able to contribute at this time. Thanks, // George -- George Georgalis, information system scientist IXOYE
Spamd and MySQL userprefs/ AWL/ Bayes
Hello all... I've spent the past 2 days trying, utterly unsuccessfully, to get spamd to run against a MySQL database. My head is bloody from banging it on the wall, and now I prostrate myself to the mailing list gods in the hopes that you may be able to help me :) I'm running SpamAssassin 3.2.3 (from Mandriva 2008.0), MySQL 5.0.45, perl-DBD-mysql-4.005, libdbi-drivers-dbd-mysql-0.8.2. I have enabled networking in /etc/my.cnf and restarted mysql. I can connect to the SpamAssassin database as the spamassassin user, both via socket and by specifying localhost or 127.0.0.1 on the command line (e.g.: mysql -u spamassassin -p -h 127.0.0.1 spamassassin). I have populated my /etc/mail/spamassassin/local.cf thusly: user_scores_dsn DBI:mysql:spamassassin:localhost user_scores_sql_usernamespamassassin user_scores_sql_passwordPASSWORD #user_scores_sql_custom_query select preference, value from userpref where username= _USERNAME_ or username = '$GLOBAL' or username = CONCAT('%',_DOMAIN_) order by ASC # bayes_store_module Mail::SpamAssassin::BayesStore::MySQL bayes_sql_dsn DBI:mysql:spamassassin:localhost bayes_sql_username spamassassin bayes_sql_password PASSWORD For the user_scores_dsn line, I have tried the above, plus DBI:mysql:spamassassin:mysql_socket=/var/lib/mysql/mysql.sock I have run spamd with every combination of flags I can think of (the current is spamd -d -x -q -c -m5). I've tried -Q, I've tried dropping the -c, I've tried in Debug mode, etc. etc. ad nauseum. No matter what iteration of spamd or spamassassin I do, I NEVER see a connection to the database, and my preferences as set in the database are never read (as evidenced by my prefids, I've inserted, deleted, etc. my preferences a whole crapload of times as well). mysql select * from userpref; +--+---+---++ | username | preference| value | prefid | +--+---+---++ | $GLOBAL | use_bayes | 1 | 45 | | $GLOBAL | required_hits | 3.50 | 46 | | $GLOBAL | use_razor2| 1 | 47 | | $GLOBAL | use_pyzor | 1 | 48 | | $GLOBAL | use_dcc | 1 | 49 | +--+---+---++ 5 rows in set (0.02 sec) I've run echo yum | spamassassin -D 21 | grep -i sql with every change I try and it always comes up empty. I have to assume that I've missed a dependency somewhere, or that I've munged my config in some way... Finally, before you jump on my choice of Mandriva as the problem, I've tried all the above on a fresh installation of Fedora Core 7, and it refuses to work in *exactly* the same manner. WTF am I doing wrong?! I know there's something stoopid that I'm missing, and it's extra frustrating because I've gotten this to work many many times. My reference installations are all working as well, and I can find NOTHING different between what's worked in the past (and is still working today) and what I'm doing right now, other than updated versions of everything (MySQL, Perl, SA, etc.). I would be very appreciative of any input that y'all care to share :) Thank you in advance, Rubin -- Rubin Bennett rbTechnologies [EMAIL PROTECTED] http://thatitguy.com (802)223-4448 Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety. -Ben Franklin, Historical Review of Pennsylvania, 1759
Re: The googolbees are getting craftier
On Tue, 22 Jan 2008, Chr. v. Stuckrad wrote: On Mon, 21 Jan 2008, John D. Hardin wrote: m,https?://(?:[^\./]+\.)*goo+gle(?:pages)?\.(?:[a-z][a-z][a-z]?(?:\.[a-z][a-z])?)/+.*[?](?:btni|adurl),i If I understand that pattern, both the '*' are 'unbounded'??? This might 'break' your spamfilter, if spamassassin gobbles up all memory during analysis. Better replace any unbounded '*' by reasonable length {0,N}, with N a little more than the seen strings. You're correct, but consider: it's unbounded *within the URI*. If this was a body or rawbody rule I would *definitely* have bounded them. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- To prevent conflict and violence from undermining development, effective disarmament programmes are vital... -- the UN, who doesn't want to confiscate guns --- 5 days until the 41st anniversary of the loss of Apollo 1
Re: more efficent big scoring
On Tue, 22 Jan 2008, George Georgalis wrote: On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote: Neither am I. Another thing to consider is the fraction of defined rules that actually hit and affect the score is rather small. The greatest optimization would be to not test REs you know will fail; but how do you do *that*? thanks for all the followups on my inquiry. I'm glad the topic is/was considered and it looks like there is some room for development, but I now realize it is not as simple as I thought it might have been. In answer to above question, maybe the tests need their own scoring? eg fast tests and with big spam scores get a higher test score than slow tests with low spam scores. maybe if there was some way to establish a hierachy at startup which groups rule processing into nodes. some nodes finish quickly, some have dependencies, some are negative, etc. Loren mentioned to me in a private email: common subexpressions. It would be theoretically possible to analyze all the rules in a given set (e.g. body rules) to extract common subexpressions and develop a processing/pruning tree based on that. You'd probably gain some performance scanning messages, but at the cost of how much startup/compiling time? -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- To prevent conflict and violence from undermining development, effective disarmament programmes are vital... -- the UN, who doesn't want to confiscate guns --- 5 days until the 41st anniversary of the loss of Apollo 1
RE: whois plugin .. where to get it
As far as blacklisting entire registrars, can you tell us any registrars that are 100% bad? I can't. Jeff C. Allegedly 100% spam. Innocent until proven guilty, ect. NUCLEAR NAMES, INC. RED PILLAR, INC. MOUZZ INTERACTIVE INC. NAMEVIEW, INC. SOLID HUB, INC. COMPANA, LLC RED REGISTER, INC. CRISP NAMES, INC. DOMAIN MODE, INC. Regtime Ltd TAHOE DOMAINS, INC. Velnet UK Limited t/a velnet UK Ltd --Chris
Re: Spamd and MySQL userprefs/ AWL/ Bayes
On Jan 22, 2008, at 10:12 AM, Rubin Bennett wrote: WTF am I doing wrong?! Not including debug logs in your message. User prefs does not work with spamassassin, so you won't see anything there, but you should be seeing something for Bayes SQL and AWL SQL if they are configured correctly. Try running spamassassin -D --lint and sending the output to the list, only then can folks really help you. Michael
Re: more efficent big scoring
John D. Hardin writes: On Tue, 22 Jan 2008, George Georgalis wrote: On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote: Neither am I. Another thing to consider is the fraction of defined rules that actually hit and affect the score is rather small. The greatest optimization would be to not test REs you know will fail; but how do you do *that*? thanks for all the followups on my inquiry. I'm glad the topic is/was considered and it looks like there is some room for development, but I now realize it is not as simple as I thought it might have been. In answer to above question, maybe the tests need their own scoring? eg fast tests and with big spam scores get a higher test score than slow tests with low spam scores. maybe if there was some way to establish a hierachy at startup which groups rule processing into nodes. some nodes finish quickly, some have dependencies, some are negative, etc. Loren mentioned to me in a private email: common subexpressions. It would be theoretically possible to analyze all the rules in a given set (e.g. body rules) to extract common subexpressions and develop a processing/pruning tree based on that. You'd probably gain some performance scanning messages, but at the cost of how much startup/compiling time? I experimented with this concept in my sa-compile work, but I could achieve any speedup on real-world mixed spam/ham datasets. Feel free to give it a try though ;) --j.
Re: more efficent big scoring
Justin Mason wrote: John D. Hardin writes: On Tue, 22 Jan 2008, George Georgalis wrote: On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote: Neither am I. Another thing to consider is the fraction of defined rules that actually hit and affect the score is rather small. The greatest optimization would be to not test REs you know will fail; but how do you do *that*? thanks for all the followups on my inquiry. I'm glad the topic is/was considered and it looks like there is some room for development, but I now realize it is not as simple as I thought it might have been. In answer to above question, maybe the tests need their own scoring? eg fast tests and with big spam scores get a higher test score than slow tests with low spam scores. maybe if there was some way to establish a hierachy at startup which groups rule processing into nodes. some nodes finish quickly, some have dependencies, some are negative, etc. Loren mentioned to me in a private email: common subexpressions. It would be theoretically possible to analyze all the rules in a given set (e.g. body rules) to extract common subexpressions and develop a processing/pruning tree based on that. You'd probably gain some performance scanning messages, but at the cost of how much startup/compiling time? I experimented with this concept in my sa-compile work, but I could achieve any speedup on real-world mixed spam/ham datasets. Feel free to give it a try though ;) --j. You do mean *couldn't* achieve any speedup, correct? -Jim
Re: more efficent big scoring
Jim Maul writes: Justin Mason wrote: John D. Hardin writes: On Tue, 22 Jan 2008, George Georgalis wrote: On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote: Neither am I. Another thing to consider is the fraction of defined rules that actually hit and affect the score is rather small. The greatest optimization would be to not test REs you know will fail; but how do you do *that*? thanks for all the followups on my inquiry. I'm glad the topic is/was considered and it looks like there is some room for development, but I now realize it is not as simple as I thought it might have been. In answer to above question, maybe the tests need their own scoring? eg fast tests and with big spam scores get a higher test score than slow tests with low spam scores. maybe if there was some way to establish a hierachy at startup which groups rule processing into nodes. some nodes finish quickly, some have dependencies, some are negative, etc. Loren mentioned to me in a private email: common subexpressions. It would be theoretically possible to analyze all the rules in a given set (e.g. body rules) to extract common subexpressions and develop a processing/pruning tree based on that. You'd probably gain some performance scanning messages, but at the cost of how much startup/compiling time? I experimented with this concept in my sa-compile work, but I could achieve any speedup on real-world mixed spam/ham datasets. Feel free to give it a try though ;) --j. You do mean *couldn't* achieve any speedup, correct? yep
Re: more efficent big scoring
John D. Hardin writes: Loren mentioned to me in a private email: common subexpressions. Whoops! Matt Kettler mentioned it to me, not Loren. Sorry! -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The difference is that Unix has had thirty years of technical types demanding basic functionality of it. And the Macintosh has had fifteen years of interface fascist users shaping its progress. Windows has the hairpin turns of the Microsoft marketing machine and that's all.-- Red Drag Diva --- 5 days until Wolfgang Amadeus Mozart's 252nd Birthday
Re: Spamd and MySQL userprefs/ AWL/ Bayes
On Tue, 2008-01-22 at 10:45 -0600, Michael Parker wrote: On Jan 22, 2008, at 10:12 AM, Rubin Bennett wrote: WTF am I doing wrong?! Not including debug logs in your message. User prefs does not work with spamassassin, so you won't see anything there, but you should be seeing something for Bayes SQL and AWL SQL if they are configured correctly. What do you mean?! Isn't that what the user_scores_dsn is all about?! Here;s the Debug output. [31488] dbg: logger: adding facilities: all [31488] dbg: logger: logging level is DBG [31488] dbg: generic: SpamAssassin version 3.2.3 [31488] dbg: config: score set 0 chosen. [31488] dbg: util: running in taint mode? no [31488] dbg: dns: is Net::DNS::Resolver available? yes [31488] dbg: dns: Net::DNS version: 0.61 [31488] dbg: diag: perl platform: 5.008008 linux [31488] dbg: diag: module installed: Digest::SHA1, version 2.11 [31488] dbg: diag: module installed: HTML::Parser, version 3.56 [31488] dbg: diag: module installed: Net::DNS, version 0.61 [31488] dbg: diag: module installed: MIME::Base64, version 3.07 [31488] dbg: diag: module installed: DB_File, version 1.815 [31488] dbg: diag: module installed: Net::SMTP, version 2.29 [31488] dbg: diag: module installed: Mail::SPF, version v2.005 [31488] dbg: diag: module installed: Mail::SPF::Query, version 1.999001 [31488] dbg: diag: module installed: IP::Country::Fast, version 604.001 [31488] dbg: diag: module installed: Razor2::Client::Agent, version 2.84 [31488] dbg: diag: module installed: Net::Ident, version 1.20 [31488] dbg: diag: module installed: IO::Socket::INET6, version 2.51 [31488] dbg: diag: module installed: IO::Socket::SSL, version 1.08 [31488] dbg: diag: module installed: Compress::Zlib, version 2.006 [31488] dbg: diag: module installed: Time::HiRes, version 1.86 [31488] dbg: diag: module not installed: Mail::DomainKeys ('require' failed) [31488] dbg: diag: module not installed: Mail::DKIM ('require' failed) [31488] dbg: diag: module installed: DBI, version 1.59 [31488] dbg: diag: module installed: Getopt::Long, version 2.35 [31488] dbg: diag: module installed: LWP::UserAgent, version 2.036 [31488] dbg: diag: module installed: HTTP::Date, version 1.47 [31488] dbg: diag: module installed: Archive::Tar, version 1.34 [31488] dbg: diag: module installed: IO::Zlib, version 1.07 [31488] dbg: diag: module installed: Encode::Detect, version 1.00 [31488] dbg: ignore: using a test message to lint rules [31488] dbg: config: using /etc/mail/spamassassin for site rules pre files [31488] dbg: config: read file /etc/mail/spamassassin/init.pre [31488] dbg: config: read file /etc/mail/spamassassin/v310.pre [31488] dbg: config: read file /etc/mail/spamassassin/v312.pre [31488] dbg: config: read file /etc/mail/spamassassin/v320.pre [31488] dbg: config: using /var/lib/spamassassin/3.002003 for sys rules pre files [31488] dbg: config: using /var/lib/spamassassin/3.002003 for default rules dir [31488] dbg: config: read file /var/lib/spamassassin/3.002003/updates_spamassassin_org.cf [31488] dbg: config: using /etc/mail/spamassassin for site rules dir [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_adult.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_bayes_poison_nxm.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_evilnum0.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_evilnum1.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_evilnum2.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj_eng.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj_x30.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_header.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_header0.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_header_eng.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_header_x264_x30.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_header_x30.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_highrisk.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_html.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_html_eng.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_html_x30.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_obfu.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_oem.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_random.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_specific.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_spoof.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_stocks.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_unsub.cf [31488] dbg: config: read file /etc/mail/spamassassin/70_sare_uri0.cf [31488] dbg: config: read file
Re: Spamd and MySQL userprefs/ AWL/ Bayes
On Tue, 2008-01-22 at 19:12 +0100, Alex Woick wrote: Rubin Bennett schrieb am 22.01.2008 17:12: I'm running SpamAssassin 3.2.3 (from Mandriva 2008.0), MySQL 5.0.45, perl-DBD-mysql-4.005, libdbi-drivers-dbd-mysql-0.8.2. What about perl-DBI-*? The libdbi-* drivers are not for perl, they are for C programming. For database access to MySQL from Perl you need the perl-DBI and the perl-DBD-mysql modules, both. Perl-DBI should be a dependency of every perl-DBD-* module, so it should already be installed, but please check it. perl-DBI-1.59 and perl-DBD-mysql-4.005 are both installed, sorry for the oversight in my original post. Rubin Tschau Alex -- Rubin Bennett rbTechnologies [EMAIL PROTECTED] http://thatitguy.com (802)223-4448 Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety. -Ben Franklin, Historical Review of Pennsylvania, 1759
Re: Spamd and MySQL userprefs/ AWL/ Bayes
Rubin Bennett schrieb am 22.01.2008 17:12: I'm running SpamAssassin 3.2.3 (from Mandriva 2008.0), MySQL 5.0.45, perl-DBD-mysql-4.005, libdbi-drivers-dbd-mysql-0.8.2. What about perl-DBI-*? The libdbi-* drivers are not for perl, they are for C programming. For database access to MySQL from Perl you need the perl-DBI and the perl-DBD-mysql modules, both. Perl-DBI should be a dependency of every perl-DBD-* module, so it should already be installed, but please check it. Tschau Alex
Re: Spamd and MySQL userprefs/ AWL/ Bayes
On Jan 22, 2008, at 12:17 PM, Rubin Bennett wrote: On Tue, 2008-01-22 at 10:45 -0600, Michael Parker wrote: On Jan 22, 2008, at 10:12 AM, Rubin Bennett wrote: WTF am I doing wrong?! Not including debug logs in your message. User prefs does not work with spamassassin, so you won't see anything there, but you should be seeing something for Bayes SQL and AWL SQL if they are configured correctly. What do you mean?! Isn't that what the user_scores_dsn is all about?! The spamassassin script. User prefs only works when you run via spamd. But lets look at the debug output: [31490] dbg: bayes: using username: root [31490] dbg: bayes: database connection established [31490] dbg: bayes: found bayes db version 3 [31490] dbg: bayes: Using userid: 1 Ok, this tells me that Bayes SQL looks to be running just fine. If you read sql/README.bayes it tells you what to look for to test if things are working correctly. [31490] dbg: bayes: corpus size: nspam = 2106, nham = 19051 [31490] dbg: bayes: tok_get_all: token count: 20 [31490] dbg: bayes: score = 0.472224419305046 [31490] dbg: bayes: DB expiry: tokens in DB: 133258, Expiry max size: 15, Oldest atime: 1193647841, Newest atime: 1201025739, Last expire: 1195029791, Current time: 1201025739 It even looks like you've got some data in there. As to the user_prefs in SQL stuff, that will require spamd -D output. Again, read sql/README for details on testing things, maybe you're just not grepping for the right string. When run run spamd under debug it will show you the exact sql query it is sending. You can run that query by hand to see if its giving back meaningful data. You might also turn on query logging on my MySQL server (assuming you have the capability) and see what it says spamd is sending. Michael
Re: more efficent big scoring
On Tue, Jan 22, 2008 at 05:24:00PM +, Justin Mason wrote: Jim Maul writes: Justin Mason wrote: John D. Hardin writes: On Tue, 22 Jan 2008, George Georgalis wrote: On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote: Neither am I. Another thing to consider is the fraction of defined rules that actually hit and affect the score is rather small. The greatest optimization would be to not test REs you know will fail; but how do you do *that*? thanks for all the followups on my inquiry. I'm glad the topic is/was considered and it looks like there is some room for development, but I now realize it is not as simple as I thought it might have been. In answer to above question, maybe the tests need their own scoring? eg fast tests and with big spam scores get a higher test score than slow tests with low spam scores. maybe if there was some way to establish a hierachy at startup which groups rule processing into nodes. some nodes finish quickly, some have dependencies, some are negative, etc. Loren mentioned to me in a private email: common subexpressions. It would be theoretically possible to analyze all the rules in a given set (e.g. body rules) to extract common subexpressions and develop a processing/pruning tree based on that. You'd probably gain some performance scanning messages, but at the cost of how much startup/compiling time? I experimented with this concept in my sa-compile work, but I could achieve any speedup on real-world mixed spam/ham datasets. Feel free to give it a try though ;) --j. You do mean *couldn't* achieve any speedup, correct? yep Just wanted to point out, this topic came out when site dns cache service started to fail due to excessive dnsbl queries. My slowdown was due to multiple timeouts and/or delay, probably related to answering joe-job rbldns backscatter -- that's the reason I was looking for early exit on scans in process. // George -- George Georgalis, information system scientist IXOYE
Google link spam?
Is anyone else getting these google link spams? They all seem to be endowment ad. Like this... Is it small? http://www.gooogle.com/search? Anyone got a rule to kill these? -- Mike B^)
Ruleset load order dependencies
OS - CentOS-5.1 / Redhat ES5 I am getting messages of this from when I start up SpamAssassin v.3.1.9 from MailScanner v.4.66.5 in --debug-sa mode: info: rules: meta test DIGEST_MULTIPLE has undefined dependency 'RAZOR2_CHECK' info: rules: meta test DIGEST_MULTIPLE has undefined dependency 'DCC_CHECK' info: rules: meta test DIGEST_MULTIPLE has undefined dependency 'PYZOR_CHECK' I find that DCC_CHECK is contained in /usr/share/spamassassin/25_dcc.cf while DIGEST_MULTIPLE is defined in /usr/share/spamassassin/20_net_tests.cf I have gathered from the documentation that rules prefaced with 20_ will load before those with 25_. Is this the reason for the error I am seeing, that DIGEST_MULTIPLE refers to a rule not yet loaded? If so, is this a bug since the files in /usr/share/spamassassin are not, as I understand, to be modified locally? -- View this message in context: http://www.nabble.com/Ruleset-load-order-dependencies-tp15032984p15032984.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Ruleset load order dependencies
On Tue, Jan 22, 2008 at 05:25:19PM -0800, byrnejb wrote: info: rules: meta test DIGEST_MULTIPLE has undefined dependency 'RAZOR2_CHECK' info: rules: meta test DIGEST_MULTIPLE has undefined dependency 'DCC_CHECK' info: rules: meta test DIGEST_MULTIPLE has undefined dependency 'PYZOR_CHECK' I have gathered from the documentation that rules prefaced with 20_ will load before those with 25_. Right. Is this the reason for the error I am seeing, that DIGEST_MULTIPLE refers to a rule not yet loaded? If so, is this a bug since the files in /usr/share/spamassassin are not, as I understand, to be modified locally? No. The problem is that you don't have the modules loaded which would let the rules get defined. The meta dependencies are checked after everything has loaded. -- Randomly Selected Tagline: There's not much you can do to ruin strips of marinated boneless chicken breast sauteed with onions and green peppers. - the Center for Science in the Public Interest about Chicken Fajitas pgpdK0M7HKGki.pgp Description: PGP signature
Re: Google link spam?
On Tue, 22 Jan 2008, Mike Yrabedra wrote: Is anyone else getting these google link spams? Yes, we've been discussing them for the past week. It's a good idea to check the list archives before asking if there are rules for a particular type of spam. http://www.gooogle.com/search? Anyone got a rule to kill these? Check the list archives for messages with google in the subject. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- USMC Rules of Gunfighting #4: If your shooting stance is good, you're probably not moving fast enough nor using cover correctly. --- 5 days until the 41st anniversary of the loss of Apollo 1
Re: Ruleset load order dependencies
info: rules: meta test DIGEST_MULTIPLE has undefined dependency 'RAZOR2_CHECK' info: rules: meta test DIGEST_MULTIPLE has undefined dependency 'DCC_CHECK' info: rules: meta test DIGEST_MULTIPLE has undefined dependency 'PYZOR_CHECK' You don't have the DCC plugin enabled, so the DCC_CHECK rule doesn't exist. It is surrounded by #ifplugin lines. DIGEST_MULTIPLE is checking for a combination of multiple digests that can include DCC. In this case if DCC isn't enabled, it is the same as not getting a DCC hit, as far as this rule is concenred. So the only bad effect is this warning message, which is hidden way down in the debug messages so that people won't normally see it and worry about it. Oh, the above also holds for Razor and Pyzor. Loren
Re: Ruleset load order dependencies
No. The problem is that you don't have the modules loaded which would let the rules get defined. The meta dependencies are checked after everything has loaded. -- How do I ensure that the proper modules are loaded and what are they called? -- View this message in context: http://www.nabble.com/Ruleset-load-order-dependencies-tp15032984p15033089.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Ruleset load order dependencies
You don't have the DCC plugin enabled, so the DCC_CHECK rule doesn't exist. It is surrounded by #ifplugin lines. OK, I modified v310.pre and I will see if that works -- View this message in context: http://www.nabble.com/Ruleset-load-order-dependencies-tp15032984p15033115.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: more efficent big scoring
John D. Hardin writes: Loren mentioned to me in a private email: common subexpressions. Whoops! Matt Kettler mentioned it to me, not Loren. Sorry! I was going to mention that I didn't think that had been me. Unless I was asleep when I wrote the reply. Which could have been the case. :-) Loren
Re: more efficent big scoring
maybe if there was some way to establish a hierachy at startup which groups rule processing into nodes. some nodes finish quickly, some have dependencies, some are negative, etc. Just wanted to point out, this topic came out when site dns cache service started to fail due to excessive dnsbl queries. My slowdown was due to multiple timeouts and/or delay, probably related to answering joe-job rbldns backscatter -- that's the reason I was looking for early exit on scans in process. There is a little of splitting rules into processing speed groups done. Specifically, the net-based tests, being dependent on external events for completion, are split out from the other tests and are processed in two phases. The first phase issues the request for information over the net, and the second phase then waits for an answer. There is a background routine that is harvesting incoming net results while other rules are processed, so when a net result is required it may already be present and no delay will be incurred. This is not an area I understand at all fully, but reading moderately recent comments on Bugzilla leads me to believe that this is an area where some improvement is still possible; there are some net tests that (I think) end up waiting immediately for an answer rather than doing the two-phase processing. How much that slows down the result for the overall email probably depends on many factors. Also note that even issuing the requests and then waiting for the result only when it is needed doesn't guarantee that the mail will not have to wait for results. It could be that one of the very first rules processed (due to priority ort meta dependency, for instance) will need a net result, and so the entire rule process will be forced to wait on it. As far as splitting non-net rules up based on speed, that isn't very practical. Regex rules should in general be quite fast, and all of them are going to require the use of the processor full-time anyway. The speed of the rule will depend on how it is written and the exact content of the email it is processing. So a rule that is dog slow on one email may be blindingly fast on most other emails. I don't know that there is any good way to estimate the speed of a regex simply by looking at it. Loren
Re: Ruleset load order dependencies
You don't have the DCC plugin enabled, so the DCC_CHECK rule doesn't exist. It is surrounded by #ifplugin lines. OK, I modified v310.pre and I will see if that works Note that some of the net checks require more setup than simply removing the hash mark from the ifplugin line. You may need extra modules loaded, or may need extra setup either in SA or someplace else. Loren
Re: Google link spam?
On Tue, 2008-01-22 at 17:31 -0800, John D. Hardin wrote: On Tue, 22 Jan 2008, Mike Yrabedra wrote: Is anyone else getting these google link spams? I've not had any complaints about them sneaking past the existing rules. Yes, we've been discussing them for the past week. It's a good idea to check the list archives before asking if there are rules for a particular type of spam. Anyone got a rule to kill these? I've run John Hardin's rule all afternoon, and from amongst about 12000 spams I only saw two that hit: Jan 22 17:29:23 sa amavis[16122]: (16122-14) SPAM, [EMAIL PROTECTED] - [EMAIL PROTECTED], Yes, score=7.843 tag=-99 tag2=4.5 kill=6.31 tests=[BODY_ENHANCEMENT=1.608, DNS_FROM_RFC_BOGUSMX=2.125, GOOG_MALWARE_URI=0.1, L_P0F_W=1, RELAY_CN=3, RELAY_US=0.01], autolearn=disabled, quarantine OOrIFqr7nOr2 (spam-quarantine) Jan 22 17:30:22 sa amavis[16422]: (16422-19) SPAM, [EMAIL PROTECTED] - [EMAIL PROTECTED], Yes, score=7.843 tag=-99 tag2=4.5 kill=6.31 tests=[BODY_ENHANCEMENT=1.608, DNS_FROM_RFC_BOGUSMX=2.125, GOOG_MALWARE_URI=0.1, L_P0F_W=1, RELAY_CN=3, RELAY_US=0.01], autolearn=disabled, quarantine hiQD+uJgfngb (spam-quarantine) Both were detected without the rule. I'll watch it for the remainder of the week before I decide whether I should keep it. -- Daniel J McDonald, CCIE #2495, CISSP #78281, CNX Austin Energy http://www.austinenergy.com signature.asc Description: This is a digitally signed message part
RE: whois plugin .. where to get it
On Tue, 2008-01-22 at 11:34 -0500, Chris Santerre wrote: As far as blacklisting entire registrars, can you tell us any registrars that are 100% bad? I can't. Jeff C. Allegedly 100% spam. Innocent until proven guilty, ect. NUCLEAR NAMES, INC. RED PILLAR, INC. MOUZZ INTERACTIVE INC. NAMEVIEW, INC. SOLID HUB, INC. COMPANA, LLC RED REGISTER, INC. CRISP NAMES, INC. DOMAIN MODE, INC. Regtime Ltd TAHOE DOMAINS, INC. Velnet UK Limited t/a velnet UK Ltd --Chris I would love to block all domains with these , but to think of it what is there to prevent them from getting themselves whitelisted by registering good domains They can register one more domain with an innocent website (say a wiki news site) etc Now they are less than 100% spammer registrars Thanks Ram