Re: The googolbees are getting craftier

2008-01-22 Thread Chr. v. Stuckrad
On Mon, 21 Jan 2008, John D. Hardin wrote:

  m,https?://(?:[^\./]+\.)*goo+gle(?:pages)?\.(?:[a-z][a-z][a-z]?(?:\.[a-z][a-z])?)/+.*[?](?:btni|adurl),i

If I understand that pattern, both the '*' are 'unbounded'???

This might 'break' your spamfilter, if spamassassin gobbles
up all memory during analysis.  Better replace any unbounded
'*' by reasonable length {0,N}, with N a little more than the
seen strings.

Stucki

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]   \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-75 459|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600|
Takustr. 9 / 14195 Berlin   * * |on IRCnet|Fax(alle):+49 30 838-75 454/


Re: The googolbees are getting craftier

2008-01-22 Thread Karsten Bräckelmann
On Tue, 2008-01-22 at 13:01 +0100, Chr. v. Stuckrad wrote:
 On Mon, 21 Jan 2008, John D. Hardin wrote:
 
   m,https?://(?:[^\./]+\.)*goo+gle(?:pages)?\.(?:[a-z][a-z][a-z]?(?:\.[a-z][a-z])?)/+.*[?](?:btni|adurl),i
 
 If I understand that pattern, both the '*' are 'unbounded'???
 
 This might 'break' your spamfilter, if spamassassin gobbles
 up all memory during analysis.  Better replace any unbounded
 '*' by reasonable length {0,N}, with N a little more than the
 seen strings.

You've snipped the beginning of the rule definition. It's an uri rule,
and thus the RE will be matched against identified URIs of the mail body
only -- which by itself usually is rather bounded. :)

  guenther


-- 
char *t=[EMAIL PROTECTED];
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Odd code at end of spam

2008-01-22 Thread Joseph Brennan



Bank phish in our spam trap ends as follows.  Is it just junk, or
is it trying to do something?

Joseph Brennan
Columbia University Information Technology




pfont face=Times New Roman, Times, serifThank you  for 
banking  with   us!/font/p
pfont   face=Times New Roman, Times, serifHuntingtonNational 
Bank CustomerSupportbr/font/p
pfont   face=Times New Roman, Times, 
serif*/font/p
pfont  face=Times New Roman, Times, serifcopy;   2008 Huntington 
Bancshares   Incorporated/font/p
pfont  color=#FD  face=Times New Roman, Times, serif13LC: 
0x550, 0x2117, 0x190, 0x0, 0x20851291, 0x24190140, 0x1YNAstack: 
0x46987084, 0x570, 0x5, 0x674, 0x3, 0x5, 0x6, 0x99947115, 0x36603937, 0x395 
0x7708, 0x78, 0x8679, 0x809, 0x190, 0x391 common: 0x97600652, 
0x78246484, 0x42, 0x831, 0x002   stack: 0x46, 0x62, 0x850, 0x65716791, 
0x16, 0x98, 0x9242, 0x9, 0x281, 0x81644476  0x8, 0x572, 0x72, 0x62, 
0x31664848, 0x0620 F9E: 0x485, 0x637, 0x94, 0x8, 0x65011960, 0x9, 0x75, 
0x9826, 0x394, 0x89, 0x56610819, 0x55687550, 0x301, 0x47  VM5: 0x4610, 
0x79, 0x37, 0x7660, 0x129, 0x9928, 0x93, 0x870, 0x3, 0x00, 
0x08883255/font/p
pfont  color=#F8 face=Times New Roman, Times, 
serifspanX9YC: 0x8, 0x0882root: 0x89202124, 0x1, 0x71 GDW2: 0x7878, 
0x975, 0x74, 0x35, 0x31   1DH, interface, 69W, function, rev.0x92, 
0x34640689  0x3, 0x45512349, 0x882, 0x04, 0x8636, 0x58198764, 0x16904327, 
0x8447, 0x9, 0x98, 0x1007, 0x308, 0x15, 0x32, 0x75   726A STTZ 7VX EYYR 
BVP4 OSX 0x83357572, 0x832, 0x3349, 0x3867, 0x789, 0x57, 0x451   5VH: 
0x114, 0x94968150, 0x7054  media: 0x2582, 0x2525, 0x69557963, 0x845, 
0x478, 0x608, 0x036, 0x77, 0x4, 0x33, 0x163, 0x002, 0x76, 
0x916/span/font/p
pfont   color=#FA face=Times New Roman, Times, 
serifspan0x87093522, 0x8, 0x6, 0x5588, 0x99453372, 0x881DQ: 0x661 
engine: 0x19, 0x183, 0x5150, 0x823 0x393, 0x271, 0x55, 0x16218081, 
0x61, 0x7   type include 54Q 5DPL   C7V: 0x9744   update: 0x7, 
0x23923010, 0x54445690, 0x7352   revision: 0x74203131, 0x0, 0x56, 0x7810, 
0x462, 0x80, 0x076, 0x8580, 0x7372  rev, api, file, dec, cvs, IKHL, 
rev, close, includeinterface: 0x76652732, 0x7, 0x84104122, 0x706, 
0x47116255, 0x9621, 0x5943, 0x2822, 0x92, 0x335, 0x269, 
0x6/span/font/p

/body
/html





Re: more efficent big scoring

2008-01-22 Thread George Georgalis
On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote:
On Sat, 19 Jan 2008, Loren Wilton wrote:

 I would not be terribly surprised to find out that on average
 there was no appreciable difference in running all rules of all
 types in priority order, over the current method;

Neither am I. Another thing to consider is the fraction of defined
rules that actually hit and affect the score is rather small. The
greatest optimization would be to not test REs you know will fail;  
but how do you do *that*?

thanks for all the followups on my inquiry. I'm glad the topic is/was
considered and it looks like there is some room for development, but
I now realize it is not as simple as I thought it might have been.
In answer to above question, maybe the tests need their own scoring?
eg fast tests and with big spam scores get a higher test score than
slow tests with low spam scores.

maybe if there was some way to establish a hierachy at startup
which groups rule processing into nodes. some nodes finish
quickly, some have dependencies, some are negative, etc.

Then utilize some sort of looping test (eg check every .5 second)
which can kill remaining tests and short circut. eg anytime in the
hierachy the score is above what the negative test can fix, etc.

Appreciate the discussion thus far, unfortunately discussion is
all I'm able to contribute at this time.

Thanks,
// George



-- 
George Georgalis, information system scientist IXOYE


Spamd and MySQL userprefs/ AWL/ Bayes

2008-01-22 Thread Rubin Bennett
Hello all...
I've spent the past 2 days trying, utterly unsuccessfully, to get spamd
to run against a MySQL database.  My head is bloody from banging it on
the wall, and now I prostrate myself to the mailing list gods in the
hopes that you may be able to help me :)

I'm running SpamAssassin 3.2.3 (from Mandriva 2008.0), MySQL 5.0.45,
perl-DBD-mysql-4.005, libdbi-drivers-dbd-mysql-0.8.2.

I have enabled networking in /etc/my.cnf and restarted mysql.
I can connect to the SpamAssassin database as the spamassassin user,
both via socket and by specifying localhost or 127.0.0.1 on the command
line (e.g.: mysql -u spamassassin -p -h 127.0.0.1 spamassassin).

I have populated my /etc/mail/spamassassin/local.cf thusly:
user_scores_dsn DBI:mysql:spamassassin:localhost
user_scores_sql_usernamespamassassin
user_scores_sql_passwordPASSWORD
#user_scores_sql_custom_query   select preference, value from userpref
where username= _USERNAME_ or username = '$GLOBAL' or username =
CONCAT('%',_DOMAIN_) order by ASC
#
bayes_store_module  Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn   DBI:mysql:spamassassin:localhost
bayes_sql_username  spamassassin
bayes_sql_password  PASSWORD

For the user_scores_dsn line, I have tried the above, plus
DBI:mysql:spamassassin:mysql_socket=/var/lib/mysql/mysql.sock

I have run spamd with every combination of flags I can think of (the
current is spamd -d -x -q -c -m5).  I've tried -Q, I've tried dropping
the -c, I've tried in Debug mode, etc. etc. ad nauseum.

No matter what iteration of spamd or spamassassin I do, I NEVER see a
connection to the database, and my preferences as set in the database
are never read (as evidenced by my prefids, I've inserted, deleted, etc.
my preferences a whole crapload of times as well).

mysql select * from userpref;
+--+---+---++
| username | preference| value | prefid |
+--+---+---++
| $GLOBAL  | use_bayes | 1 | 45 |
| $GLOBAL  | required_hits | 3.50  | 46 |
| $GLOBAL  | use_razor2| 1 | 47 |
| $GLOBAL  | use_pyzor | 1 | 48 |
| $GLOBAL  | use_dcc   | 1 | 49 |
+--+---+---++
5 rows in set (0.02 sec)

I've run echo yum | spamassassin -D 21 | grep -i sql with every
change I try and it always comes up empty.  I have to assume that I've
missed a dependency somewhere, or that I've munged my config in some
way...

Finally, before you jump on my choice of Mandriva as the problem, I've
tried all the above on a fresh installation of Fedora Core 7, and it
refuses to work in *exactly* the same manner.

WTF am I doing wrong?!  I know there's something stoopid that I'm
missing, and it's extra frustrating because I've gotten this to work
many many times.  My reference installations are all working as well,
and I can find NOTHING different between what's worked in the past (and
is still working today) and what I'm doing right now, other than updated
versions of everything (MySQL, Perl, SA, etc.).

I would be very appreciative of any input that y'all care to share :)

Thank you in advance,
Rubin
-- 
Rubin Bennett
rbTechnologies
[EMAIL PROTECTED]
http://thatitguy.com
(802)223-4448

Those who would give up essential liberty to purchase a little
temporary safety deserve neither liberty nor safety.
-Ben Franklin, Historical Review of Pennsylvania, 1759



Re: The googolbees are getting craftier

2008-01-22 Thread John D. Hardin
On Tue, 22 Jan 2008, Chr. v. Stuckrad wrote:

 On Mon, 21 Jan 2008, John D. Hardin wrote:
 
   m,https?://(?:[^\./]+\.)*goo+gle(?:pages)?\.(?:[a-z][a-z][a-z]?(?:\.[a-z][a-z])?)/+.*[?](?:btni|adurl),i
 
 If I understand that pattern, both the '*' are 'unbounded'???
 
 This might 'break' your spamfilter, if spamassassin gobbles
 up all memory during analysis.  Better replace any unbounded
 '*' by reasonable length {0,N}, with N a little more than the
 seen strings.

You're correct, but consider: it's unbounded *within the URI*. If this 
was a body or rawbody rule I would *definitely* have bounded them.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 To prevent conflict and violence from undermining development,
 effective disarmament programmes are vital...
  -- the UN, who doesn't want to confiscate guns
---
 5 days until the 41st anniversary of the loss of Apollo 1



Re: more efficent big scoring

2008-01-22 Thread John D. Hardin
On Tue, 22 Jan 2008, George Georgalis wrote:

 On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote:

 Neither am I. Another thing to consider is the fraction of defined
 rules that actually hit and affect the score is rather small. The
 greatest optimization would be to not test REs you know will fail;  
 but how do you do *that*?
 
 thanks for all the followups on my inquiry. I'm glad the topic is/was
 considered and it looks like there is some room for development, but
 I now realize it is not as simple as I thought it might have been.
 In answer to above question, maybe the tests need their own scoring?
 eg fast tests and with big spam scores get a higher test score than
 slow tests with low spam scores.
 
 maybe if there was some way to establish a hierachy at startup
 which groups rule processing into nodes. some nodes finish
 quickly, some have dependencies, some are negative, etc.

Loren mentioned to me in a private email: common subexpressions.

It would be theoretically possible to analyze all the rules in a given
set (e.g. body rules) to extract common subexpressions and develop a
processing/pruning tree based on that. You'd probably gain some
performance scanning messages, but at the cost of how much
startup/compiling time?

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 To prevent conflict and violence from undermining development,
 effective disarmament programmes are vital...
  -- the UN, who doesn't want to confiscate guns
---
 5 days until the 41st anniversary of the loss of Apollo 1



RE: whois plugin .. where to get it

2008-01-22 Thread Chris Santerre
 As far as blacklisting entire registrars, can you  
 tell us any registrars that are 100% bad?  I can't.
 
 Jeff C.
 

Allegedly 100% spam. Innocent until proven guilty, ect. 

NUCLEAR NAMES, INC.
RED PILLAR, INC.
MOUZZ INTERACTIVE INC.
NAMEVIEW, INC.
SOLID HUB, INC.
COMPANA, LLC
RED REGISTER, INC.
CRISP NAMES, INC.
DOMAIN MODE, INC.
Regtime Ltd
TAHOE DOMAINS, INC.
Velnet UK Limited t/a velnet UK Ltd 

--Chris


Re: Spamd and MySQL userprefs/ AWL/ Bayes

2008-01-22 Thread Michael Parker

On Jan 22, 2008, at 10:12 AM, Rubin Bennett wrote:


WTF am I doing wrong?!


Not including debug logs in your message.

User prefs does not work with spamassassin, so you won't see anything  
there, but you should be seeing something for Bayes SQL and AWL SQL if  
they are configured correctly.


Try running spamassassin -D --lint and sending the output to the list,  
only then can folks really help you.


Michael



Re: more efficent big scoring

2008-01-22 Thread Justin Mason

John D. Hardin writes:
 On Tue, 22 Jan 2008, George Georgalis wrote:
 
  On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote:
 
  Neither am I. Another thing to consider is the fraction of defined
  rules that actually hit and affect the score is rather small. The
  greatest optimization would be to not test REs you know will fail;  
  but how do you do *that*?
  
  thanks for all the followups on my inquiry. I'm glad the topic is/was
  considered and it looks like there is some room for development, but
  I now realize it is not as simple as I thought it might have been.
  In answer to above question, maybe the tests need their own scoring?
  eg fast tests and with big spam scores get a higher test score than
  slow tests with low spam scores.
  
  maybe if there was some way to establish a hierachy at startup
  which groups rule processing into nodes. some nodes finish
  quickly, some have dependencies, some are negative, etc.
 
 Loren mentioned to me in a private email: common subexpressions.
 
 It would be theoretically possible to analyze all the rules in a given
 set (e.g. body rules) to extract common subexpressions and develop a
 processing/pruning tree based on that. You'd probably gain some
 performance scanning messages, but at the cost of how much
 startup/compiling time?

I experimented with this concept in my sa-compile work, but I could
achieve any speedup on real-world mixed spam/ham datasets.

Feel free to give it a try though ;)

--j.


Re: more efficent big scoring

2008-01-22 Thread Jim Maul

Justin Mason wrote:

John D. Hardin writes:

On Tue, 22 Jan 2008, George Georgalis wrote:


On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote:


Neither am I. Another thing to consider is the fraction of defined
rules that actually hit and affect the score is rather small. The
greatest optimization would be to not test REs you know will fail;  
but how do you do *that*?

thanks for all the followups on my inquiry. I'm glad the topic is/was
considered and it looks like there is some room for development, but
I now realize it is not as simple as I thought it might have been.
In answer to above question, maybe the tests need their own scoring?
eg fast tests and with big spam scores get a higher test score than
slow tests with low spam scores.

maybe if there was some way to establish a hierachy at startup
which groups rule processing into nodes. some nodes finish
quickly, some have dependencies, some are negative, etc.

Loren mentioned to me in a private email: common subexpressions.

It would be theoretically possible to analyze all the rules in a given
set (e.g. body rules) to extract common subexpressions and develop a
processing/pruning tree based on that. You'd probably gain some
performance scanning messages, but at the cost of how much
startup/compiling time?


I experimented with this concept in my sa-compile work, but I could
achieve any speedup on real-world mixed spam/ham datasets.

Feel free to give it a try though ;)

--j.




You do mean *couldn't* achieve any speedup, correct?

-Jim



Re: more efficent big scoring

2008-01-22 Thread Justin Mason

Jim Maul writes:
 Justin Mason wrote:
  John D. Hardin writes:
  On Tue, 22 Jan 2008, George Georgalis wrote:
 
  On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote:
 
  Neither am I. Another thing to consider is the fraction of defined
  rules that actually hit and affect the score is rather small. The
  greatest optimization would be to not test REs you know will fail;  
  but how do you do *that*?
  thanks for all the followups on my inquiry. I'm glad the topic is/was
  considered and it looks like there is some room for development, but
  I now realize it is not as simple as I thought it might have been.
  In answer to above question, maybe the tests need their own scoring?
  eg fast tests and with big spam scores get a higher test score than
  slow tests with low spam scores.
 
  maybe if there was some way to establish a hierachy at startup
  which groups rule processing into nodes. some nodes finish
  quickly, some have dependencies, some are negative, etc.
  Loren mentioned to me in a private email: common subexpressions.
 
  It would be theoretically possible to analyze all the rules in a given
  set (e.g. body rules) to extract common subexpressions and develop a
  processing/pruning tree based on that. You'd probably gain some
  performance scanning messages, but at the cost of how much
  startup/compiling time?
  
  I experimented with this concept in my sa-compile work, but I could
  achieve any speedup on real-world mixed spam/ham datasets.
  
  Feel free to give it a try though ;)
  
  --j.
  
  
 
 You do mean *couldn't* achieve any speedup, correct?

yep


Re: more efficent big scoring

2008-01-22 Thread John D. Hardin
John D. Hardin writes:
 
 Loren mentioned to me in a private email: common subexpressions.

Whoops! Matt Kettler mentioned it to me, not Loren. Sorry!

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The difference is that Unix has had thirty years of technical
  types demanding basic functionality of it. And the Macintosh has
  had fifteen years of interface fascist users shaping its progress.
  Windows has the hairpin turns of the Microsoft marketing machine
  and that's all.-- Red Drag Diva
---
 5 days until Wolfgang Amadeus Mozart's 252nd Birthday



Re: Spamd and MySQL userprefs/ AWL/ Bayes

2008-01-22 Thread Rubin Bennett

On Tue, 2008-01-22 at 10:45 -0600, Michael Parker wrote:
 On Jan 22, 2008, at 10:12 AM, Rubin Bennett wrote:
 
  WTF am I doing wrong?!
 
 Not including debug logs in your message.
 
 User prefs does not work with spamassassin, so you won't see anything  
 there, but you should be seeing something for Bayes SQL and AWL SQL if  
 they are configured correctly.
 
What do you mean?!  Isn't that what the user_scores_dsn is all about?!
Here;s the Debug output.
[31488] dbg: logger: adding facilities: all
[31488] dbg: logger: logging level is DBG
[31488] dbg: generic: SpamAssassin version 3.2.3
[31488] dbg: config: score set 0 chosen.
[31488] dbg: util: running in taint mode? no
[31488] dbg: dns: is Net::DNS::Resolver available? yes
[31488] dbg: dns: Net::DNS version: 0.61
[31488] dbg: diag: perl platform: 5.008008 linux
[31488] dbg: diag: module installed: Digest::SHA1, version 2.11
[31488] dbg: diag: module installed: HTML::Parser, version 3.56
[31488] dbg: diag: module installed: Net::DNS, version 0.61
[31488] dbg: diag: module installed: MIME::Base64, version 3.07
[31488] dbg: diag: module installed: DB_File, version 1.815
[31488] dbg: diag: module installed: Net::SMTP, version 2.29
[31488] dbg: diag: module installed: Mail::SPF, version v2.005
[31488] dbg: diag: module installed: Mail::SPF::Query, version 1.999001
[31488] dbg: diag: module installed: IP::Country::Fast, version 604.001
[31488] dbg: diag: module installed: Razor2::Client::Agent, version 2.84
[31488] dbg: diag: module installed: Net::Ident, version 1.20
[31488] dbg: diag: module installed: IO::Socket::INET6, version 2.51
[31488] dbg: diag: module installed: IO::Socket::SSL, version 1.08
[31488] dbg: diag: module installed: Compress::Zlib, version 2.006
[31488] dbg: diag: module installed: Time::HiRes, version 1.86
[31488] dbg: diag: module not installed: Mail::DomainKeys ('require'
failed)
[31488] dbg: diag: module not installed: Mail::DKIM ('require' failed)
[31488] dbg: diag: module installed: DBI, version 1.59
[31488] dbg: diag: module installed: Getopt::Long, version 2.35
[31488] dbg: diag: module installed: LWP::UserAgent, version 2.036
[31488] dbg: diag: module installed: HTTP::Date, version 1.47
[31488] dbg: diag: module installed: Archive::Tar, version 1.34
[31488] dbg: diag: module installed: IO::Zlib, version 1.07
[31488] dbg: diag: module installed: Encode::Detect, version 1.00
[31488] dbg: ignore: using a test message to lint rules
[31488] dbg: config: using /etc/mail/spamassassin for site rules pre
files
[31488] dbg: config: read file /etc/mail/spamassassin/init.pre
[31488] dbg: config: read file /etc/mail/spamassassin/v310.pre
[31488] dbg: config: read file /etc/mail/spamassassin/v312.pre
[31488] dbg: config: read file /etc/mail/spamassassin/v320.pre
[31488] dbg: config: using /var/lib/spamassassin/3.002003 for sys
rules pre files
[31488] dbg: config: using /var/lib/spamassassin/3.002003 for default
rules dir
[31488] dbg: config: read
file /var/lib/spamassassin/3.002003/updates_spamassassin_org.cf
[31488] dbg: config: using /etc/mail/spamassassin for site rules dir
[31488] dbg: config: read file /etc/mail/spamassassin/70_sare_adult.cf
[31488] dbg: config: read
file /etc/mail/spamassassin/70_sare_bayes_poison_nxm.cf
[31488] dbg: config: read
file /etc/mail/spamassassin/70_sare_evilnum0.cf
[31488] dbg: config: read
file /etc/mail/spamassassin/70_sare_evilnum1.cf
[31488] dbg: config: read
file /etc/mail/spamassassin/70_sare_evilnum2.cf
[31488] dbg: config: read
file /etc/mail/spamassassin/70_sare_genlsubj.cf
[31488] dbg: config: read
file /etc/mail/spamassassin/70_sare_genlsubj_eng.cf
[31488] dbg: config: read
file /etc/mail/spamassassin/70_sare_genlsubj_x30.cf
[31488] dbg: config: read file /etc/mail/spamassassin/70_sare_header.cf
[31488] dbg: config: read file /etc/mail/spamassassin/70_sare_header0.cf
[31488] dbg: config: read
file /etc/mail/spamassassin/70_sare_header_eng.cf
[31488] dbg: config: read
file /etc/mail/spamassassin/70_sare_header_x264_x30.cf
[31488] dbg: config: read
file /etc/mail/spamassassin/70_sare_header_x30.cf
[31488] dbg: config: read
file /etc/mail/spamassassin/70_sare_highrisk.cf
[31488] dbg: config: read file /etc/mail/spamassassin/70_sare_html.cf
[31488] dbg: config: read
file /etc/mail/spamassassin/70_sare_html_eng.cf
[31488] dbg: config: read
file /etc/mail/spamassassin/70_sare_html_x30.cf
[31488] dbg: config: read file /etc/mail/spamassassin/70_sare_obfu.cf
[31488] dbg: config: read file /etc/mail/spamassassin/70_sare_oem.cf
[31488] dbg: config: read file /etc/mail/spamassassin/70_sare_random.cf
[31488] dbg: config: read
file /etc/mail/spamassassin/70_sare_specific.cf
[31488] dbg: config: read file /etc/mail/spamassassin/70_sare_spoof.cf
[31488] dbg: config: read file /etc/mail/spamassassin/70_sare_stocks.cf
[31488] dbg: config: read file /etc/mail/spamassassin/70_sare_unsub.cf
[31488] dbg: config: read file /etc/mail/spamassassin/70_sare_uri0.cf
[31488] dbg: config: read file 

Re: Spamd and MySQL userprefs/ AWL/ Bayes

2008-01-22 Thread Rubin Bennett
On Tue, 2008-01-22 at 19:12 +0100, Alex Woick wrote:
 Rubin Bennett schrieb am 22.01.2008 17:12:
 
  I'm running SpamAssassin 3.2.3 (from Mandriva 2008.0), MySQL 5.0.45,
  perl-DBD-mysql-4.005, libdbi-drivers-dbd-mysql-0.8.2.
 
 What about perl-DBI-*? The libdbi-* drivers are not for perl, they are 
 for C programming. For database access to MySQL from Perl you need the 
 perl-DBI and the perl-DBD-mysql modules, both. Perl-DBI should be a 
 dependency of every perl-DBD-* module, so it should already be 
 installed, but please check it.
 
perl-DBI-1.59 and perl-DBD-mysql-4.005 are both installed, sorry for the
oversight in my original post.
Rubin
 Tschau
 Alex
-- 
Rubin Bennett
rbTechnologies
[EMAIL PROTECTED]
http://thatitguy.com
(802)223-4448

Those who would give up essential liberty to purchase a little
temporary safety deserve neither liberty nor safety.
-Ben Franklin, Historical Review of Pennsylvania, 1759



Re: Spamd and MySQL userprefs/ AWL/ Bayes

2008-01-22 Thread Alex Woick

Rubin Bennett schrieb am 22.01.2008 17:12:


I'm running SpamAssassin 3.2.3 (from Mandriva 2008.0), MySQL 5.0.45,
perl-DBD-mysql-4.005, libdbi-drivers-dbd-mysql-0.8.2.


What about perl-DBI-*? The libdbi-* drivers are not for perl, they are 
for C programming. For database access to MySQL from Perl you need the 
perl-DBI and the perl-DBD-mysql modules, both. Perl-DBI should be a 
dependency of every perl-DBD-* module, so it should already be 
installed, but please check it.


Tschau
Alex


Re: Spamd and MySQL userprefs/ AWL/ Bayes

2008-01-22 Thread Michael Parker


On Jan 22, 2008, at 12:17 PM, Rubin Bennett wrote:



On Tue, 2008-01-22 at 10:45 -0600, Michael Parker wrote:

On Jan 22, 2008, at 10:12 AM, Rubin Bennett wrote:


WTF am I doing wrong?!


Not including debug logs in your message.

User prefs does not work with spamassassin, so you won't see anything
there, but you should be seeing something for Bayes SQL and AWL SQL  
if

they are configured correctly.


What do you mean?!  Isn't that what the user_scores_dsn is all about?!



The spamassassin script.  User prefs only works when you run via  
spamd.  But lets look at the debug output:




[31490] dbg: bayes: using username: root
[31490] dbg: bayes: database connection established
[31490] dbg: bayes: found bayes db version 3
[31490] dbg: bayes: Using userid: 1


Ok, this tells me that Bayes SQL looks to be running just fine.  If  
you read sql/README.bayes it tells you what to look for to test if  
things are working correctly.




[31490] dbg: bayes: corpus size: nspam = 2106, nham = 19051
[31490] dbg: bayes: tok_get_all: token count: 20
[31490] dbg: bayes: score = 0.472224419305046
[31490] dbg: bayes: DB expiry: tokens in DB: 133258, Expiry max size:
15, Oldest atime: 1193647841, Newest atime: 1201025739, Last  
expire:

1195029791, Current time: 1201025739


It even looks like you've got some data in there.


As to the user_prefs in SQL stuff, that will require spamd -D output.   
Again, read sql/README for details on testing things, maybe you're  
just not grepping for the right string.  When run run spamd under  
debug it will show you the exact sql query it is sending.  You can run  
that query by hand to see if its giving back meaningful data.  You  
might also turn on query logging on my MySQL server (assuming you have  
the capability) and see what it says spamd is sending.


Michael


Re: more efficent big scoring

2008-01-22 Thread George Georgalis
On Tue, Jan 22, 2008 at 05:24:00PM +, Justin Mason wrote:

Jim Maul writes:
 Justin Mason wrote:
  John D. Hardin writes:
  On Tue, 22 Jan 2008, George Georgalis wrote:
 
  On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote:
 
  Neither am I. Another thing to consider is the fraction of defined
  rules that actually hit and affect the score is rather small. The
  greatest optimization would be to not test REs you know will fail;  
  but how do you do *that*?
  thanks for all the followups on my inquiry. I'm glad the topic is/was
  considered and it looks like there is some room for development, but
  I now realize it is not as simple as I thought it might have been.
  In answer to above question, maybe the tests need their own scoring?
  eg fast tests and with big spam scores get a higher test score than
  slow tests with low spam scores.
 
  maybe if there was some way to establish a hierachy at startup
  which groups rule processing into nodes. some nodes finish
  quickly, some have dependencies, some are negative, etc.
  Loren mentioned to me in a private email: common subexpressions.
 
  It would be theoretically possible to analyze all the rules in a given
  set (e.g. body rules) to extract common subexpressions and develop a
  processing/pruning tree based on that. You'd probably gain some
  performance scanning messages, but at the cost of how much
  startup/compiling time?
  
  I experimented with this concept in my sa-compile work, but I could
  achieve any speedup on real-world mixed spam/ham datasets.
  
  Feel free to give it a try though ;)
  
  --j.
  
  
 
 You do mean *couldn't* achieve any speedup, correct?

yep


Just wanted to point out, this topic came out when site dns
cache service started to fail due to excessive dnsbl queries. My
slowdown was due to multiple timeouts and/or delay, probably
related to answering joe-job rbldns backscatter -- that's the
reason I was looking for early exit on scans in process.

// George



-- 
George Georgalis, information system scientist IXOYE


Google link spam?

2008-01-22 Thread Mike Yrabedra
Is anyone else getting these google link spams?

They all seem to be endowment ad.

Like this...

Is it small?

http://www.gooogle.com/search?

Anyone got a rule to kill these?


-- 
Mike B^)





Ruleset load order dependencies

2008-01-22 Thread byrnejb

OS - CentOS-5.1 / Redhat ES5

I am getting messages of this from when I start up SpamAssassin v.3.1.9 from
MailScanner v.4.66.5 in --debug-sa mode:

info: rules: meta test DIGEST_MULTIPLE has undefined dependency
'RAZOR2_CHECK'
info: rules: meta test DIGEST_MULTIPLE has undefined dependency 'DCC_CHECK'
info: rules: meta test DIGEST_MULTIPLE has undefined dependency
'PYZOR_CHECK'

I find that DCC_CHECK is contained in /usr/share/spamassassin/25_dcc.cf

while DIGEST_MULTIPLE is defined in /usr/share/spamassassin/20_net_tests.cf

I have gathered from the documentation that rules prefaced with 20_ will
load before those with 25_.  Is this the reason for the error I am seeing,
that DIGEST_MULTIPLE refers to a rule not yet loaded?  If so, is this a bug
since the files in /usr/share/spamassassin are not, as I understand, to be
modified locally?


-- 
View this message in context: 
http://www.nabble.com/Ruleset-load-order-dependencies-tp15032984p15032984.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Ruleset load order dependencies

2008-01-22 Thread Theo Van Dinter
On Tue, Jan 22, 2008 at 05:25:19PM -0800, byrnejb wrote:
 info: rules: meta test DIGEST_MULTIPLE has undefined dependency
 'RAZOR2_CHECK'
 info: rules: meta test DIGEST_MULTIPLE has undefined dependency 'DCC_CHECK'
 info: rules: meta test DIGEST_MULTIPLE has undefined dependency
 'PYZOR_CHECK'
 
 I have gathered from the documentation that rules prefaced with 20_ will
 load before those with 25_.

Right.

 Is this the reason for the error I am seeing,
 that DIGEST_MULTIPLE refers to a rule not yet loaded?  If so, is this a bug
 since the files in /usr/share/spamassassin are not, as I understand, to be
 modified locally?

No.  The problem is that you don't have the modules loaded which would let the
rules get defined.  The meta dependencies are checked after everything has
loaded.

-- 
Randomly Selected Tagline:
There's not much you can do to ruin strips of marinated boneless chicken
 breast sauteed with onions and green peppers.
   - the Center for Science in the Public Interest about Chicken Fajitas


pgpdK0M7HKGki.pgp
Description: PGP signature


Re: Google link spam?

2008-01-22 Thread John D. Hardin
On Tue, 22 Jan 2008, Mike Yrabedra wrote:

 Is anyone else getting these google link spams?

Yes, we've been discussing them for the past week.

It's a good idea to check the list archives before asking if there are 
rules for a particular type of spam.

 http://www.gooogle.com/search?
 
 Anyone got a rule to kill these?

Check the list archives for messages with google in the subject.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  USMC Rules of Gunfighting #4: If your shooting stance is good,
  you're probably not moving fast enough nor using cover correctly.
---
 5 days until the 41st anniversary of the loss of Apollo 1



Re: Ruleset load order dependencies

2008-01-22 Thread Loren Wilton

info: rules: meta test DIGEST_MULTIPLE has undefined dependency
'RAZOR2_CHECK'
info: rules: meta test DIGEST_MULTIPLE has undefined dependency 
'DCC_CHECK'

info: rules: meta test DIGEST_MULTIPLE has undefined dependency
'PYZOR_CHECK'


You don't have the DCC plugin enabled, so the DCC_CHECK rule doesn't exist. 
It is surrounded by #ifplugin lines.


DIGEST_MULTIPLE is checking for a combination of multiple digests that can 
include DCC.  In this case if DCC isn't enabled, it is the same as not 
getting a DCC hit, as far as this rule is concenred.  So the only bad effect 
is this warning message, which is hidden way down in the debug messages so 
that people won't normally see it and worry about it.


Oh, the above also holds for Razor and Pyzor.

   Loren




Re: Ruleset load order dependencies

2008-01-22 Thread byrnejb



No.  The problem is that you don't have the modules loaded which would let
the
rules get defined.  The meta dependencies are checked after everything has
loaded.

-- 

How do I ensure that the proper modules are loaded and what are they called?
-- 
View this message in context: 
http://www.nabble.com/Ruleset-load-order-dependencies-tp15032984p15033089.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Ruleset load order dependencies

2008-01-22 Thread byrnejb



You don't have the DCC plugin enabled, so the DCC_CHECK rule doesn't exist. 
It is surrounded by #ifplugin lines.


OK, I modified v310.pre and I will see if that works

-- 
View this message in context: 
http://www.nabble.com/Ruleset-load-order-dependencies-tp15032984p15033115.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: more efficent big scoring

2008-01-22 Thread Loren Wilton

John D. Hardin writes:


Loren mentioned to me in a private email: common subexpressions.


Whoops! Matt Kettler mentioned it to me, not Loren. Sorry!


I was going to mention that I didn't think that had been me.
Unless I was asleep when I wrote the reply.  Which could have been the case. 
:-)


   Loren




Re: more efficent big scoring

2008-01-22 Thread Loren Wilton

 maybe if there was some way to establish a hierachy at startup
 which groups rule processing into nodes. some nodes finish
 quickly, some have dependencies, some are negative, etc.


Just wanted to point out, this topic came out when site dns
cache service started to fail due to excessive dnsbl queries. My
slowdown was due to multiple timeouts and/or delay, probably
related to answering joe-job rbldns backscatter -- that's the
reason I was looking for early exit on scans in process.


There is a little of splitting rules into processing speed groups done. 
Specifically, the net-based tests, being dependent on external events for 
completion, are split out from the other tests and are processed in two 
phases.  The first phase issues the request for information over the net, 
and the second phase then waits for an answer.  There is a background 
routine that is harvesting incoming net results while other rules are 
processed, so when a net result is required it may already be present and no 
delay will be incurred.


This is not an area I understand at all fully, but reading moderately recent 
comments on Bugzilla leads me to believe that this is an area where some 
improvement is still possible; there are some net tests that (I think) end 
up waiting immediately for an answer rather than doing the two-phase 
processing.  How much that slows down the result for the overall email 
probably depends on many factors.


Also note that even issuing the requests and then waiting for the result 
only when it is needed doesn't guarantee that the mail will not have to wait 
for results.  It could be that one of the very first rules processed (due to 
priority ort meta dependency, for instance) will need a net result, and so 
the entire rule process will be forced to wait on it.


As far as splitting non-net rules up based on speed, that isn't very 
practical.  Regex rules should in general be quite fast, and all of them are 
going to require the use of the processor full-time anyway.  The speed of 
the rule will depend on how it is written and the exact content of the email 
it is processing.  So a rule that is dog slow on one email may be blindingly 
fast on most other emails.  I don't know that there is any good way to 
estimate the speed of a regex simply by looking at it.


   Loren




Re: Ruleset load order dependencies

2008-01-22 Thread Loren Wilton
You don't have the DCC plugin enabled, so the DCC_CHECK rule doesn't 
exist.

It is surrounded by #ifplugin lines.

OK, I modified v310.pre and I will see if that works


Note that some of the net checks require more setup than simply removing the 
hash mark from the ifplugin line.  You may need extra modules loaded, or may 
need extra setup either in SA or someplace else.


   Loren




Re: Google link spam?

2008-01-22 Thread McDonald, Dan

On Tue, 2008-01-22 at 17:31 -0800, John D. Hardin wrote:
 On Tue, 22 Jan 2008, Mike Yrabedra wrote:
 
  Is anyone else getting these google link spams?
 
I've not had any complaints about them sneaking past the existing rules.

 Yes, we've been discussing them for the past week.
 
 It's a good idea to check the list archives before asking if there are 
 rules for a particular type of spam.
 
  Anyone got a rule to kill these?


I've run John Hardin's rule all afternoon, and from amongst about 12000
spams I only saw two that hit:

Jan 22 17:29:23 sa amavis[16122]: (16122-14) SPAM,
[EMAIL PROTECTED] - [EMAIL PROTECTED], Yes,
score=7.843 tag=-99 tag2=4.5 kill=6.31 tests=[BODY_ENHANCEMENT=1.608,
DNS_FROM_RFC_BOGUSMX=2.125, GOOG_MALWARE_URI=0.1, L_P0F_W=1, RELAY_CN=3,
RELAY_US=0.01], autolearn=disabled, quarantine OOrIFqr7nOr2
(spam-quarantine)
Jan 22 17:30:22 sa amavis[16422]: (16422-19) SPAM,
[EMAIL PROTECTED] - [EMAIL PROTECTED], Yes,
score=7.843 tag=-99 tag2=4.5 kill=6.31 tests=[BODY_ENHANCEMENT=1.608,
DNS_FROM_RFC_BOGUSMX=2.125, GOOG_MALWARE_URI=0.1, L_P0F_W=1, RELAY_CN=3,
RELAY_US=0.01], autolearn=disabled, quarantine hiQD+uJgfngb
(spam-quarantine)

Both were detected without the rule.  I'll watch it for the remainder of
the week before I decide whether I should keep it.

-- 
Daniel J McDonald, CCIE #2495, CISSP #78281, CNX
Austin Energy
http://www.austinenergy.com



signature.asc
Description: This is a digitally signed message part


RE: whois plugin .. where to get it

2008-01-22 Thread ram
On Tue, 2008-01-22 at 11:34 -0500, Chris Santerre wrote:
  As far as blacklisting entire registrars, can you   
  tell us any registrars that are 100% bad?  I can't. 
   
  Jeff C. 
  
 
 Allegedly 100% spam. Innocent until proven guilty, ect. 
 
 NUCLEAR NAMES, INC. 
 RED PILLAR, INC. 
 MOUZZ INTERACTIVE INC. 
 NAMEVIEW, INC. 
 SOLID HUB, INC. 
 COMPANA, LLC 
 RED REGISTER, INC. 
 CRISP NAMES, INC. 
 DOMAIN MODE, INC. 
 Regtime Ltd 
 TAHOE DOMAINS, INC. 
 Velnet UK Limited t/a velnet UK Ltd 
 
 --Chris
 

I would love to block all domains with these , but to think of it what
is there to prevent them from getting themselves whitelisted by
registering good domains 
They can register one more domain with an innocent website (say a wiki
news site)  etc Now they are less than 100% spammer registrars 



Thanks
Ram