Re: Ruleset load order dependencies

2008-01-23 Thread Matus UHLAR - fantomas
On 22.01.08 17:34, byrnejb wrote:
 No.  The problem is that you don't have the modules loaded which would let
 the
 rules get defined.  The meta dependencies are checked after everything has
 loaded.
 
 -- 
 
 How do I ensure that the proper modules are loaded and what are they called?

please, learn to quote and don't indent original text by --  string - that
is a signature separator. 

uncomment razor, pyzor and DCC modules. Note that they all need external
programs to work and DCC requires own DCC server for mailservers with 100k
messages per day
-- 
Matus UHLAR - fantomas, [EMAIL PROTECTED] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
42.7 percent of all statistics are made up on the spot. 


Re: Feeding SA-learn

2008-01-23 Thread Diego Pomatta

Anthony Peacock escribió:

Can I feed a plain text file representing just the body
of a message to sa-learn?

/Diego



Yes you can, who to stop it?

I just  sent your message body as --ham, and it told it learned one 
message.


  

I meant without the headers, just the body.
ok thanks



Well the short answer is, yes you can.

The slightly longer answer is that you won't get as good results doing 
this, as the Bayes system uses tokens found in the complete message.  
By only learning on the body you will not gain any advantage for 
tokens found in headers.





Yep, I know, precisely the problem is that I don't have the original 
headers after the mail has been delivered.
My intention was to manually feed the few spam messages that slip thru 
undetected. By the time I get a hold of those, they are in the 
recipient's mail client inbox, not in the server.
I was thinking, if I save the mail as EML files, would that preserve the 
headers in a way that sa-learn can parse correctly?


Thanks
/Diego


Re: Feeding SA-learn

2008-01-23 Thread Anthony Peacock

Diego Pomatta wrote:

Anthony Peacock escribió:

Can I feed a plain text file representing just the body
of a message to sa-learn?

/Diego



Yes you can, who to stop it?

I just  sent your message body as --ham, and it told it learned one 
message.


  

I meant without the headers, just the body.
ok thanks



Well the short answer is, yes you can.

The slightly longer answer is that you won't get as good results doing 
this, as the Bayes system uses tokens found in the complete message.  
By only learning on the body you will not gain any advantage for 
tokens found in headers.





Yep, I know, precisely the problem is that I don't have the original 
headers after the mail has been delivered.
My intention was to manually feed the few spam messages that slip thru 
undetected. By the time I get a hold of those, they are in the 
recipient's mail client inbox, not in the server.
I was thinking, if I save the mail as EML files, would that preserve the 
headers in a way that sa-learn can parse correctly?



Depends on the client.

For instance, Thunderbird stores it's folders in mbox format, so 
sa-learn can work against those files as-is.  Other email clients can 
save emails in text format complete with headers.


The biggest problem with this is training the users to do that consistantly.


--
Anthony Peacock
CHIME, Royal Free  University College Medical School
WWW:http://www.chime.ucl.ac.uk/~rmhiajp/
A CAT scan should take less time than a PET scan.  For a CAT scan,
 they're only looking for one thing, whereas a PET scan could result in
 a lot of things.- Carl Princi, 2002/07/19


Re: more efficent big scoring

2008-01-23 Thread Justin Mason

To clarify -- here's how the current code orders rule evaluation:

- message metadata is extracted.

- header DNSBL tests are started.

- the decoded forms of the body text are extracted and cached.

- the URIs in the message body are extracted and cached.

- Iterates through each known priority level, defined in the active
  ruleset, from lowest to highest, and:

  - checks to see if it's shortcircuited; if it has, it breaks the loop

  - calls the 'check_rules_at_priority' plugin hook

  - runs all head and head-eval rules defined for that priority level
  - checks to see if there are network rules to harvest
  - runs all body and body-eval rules defined for that priority level
  - checks to see if there are network rules to harvest
  - runs all uri rules defined for that priority level
  - checks to see if there are network rules to harvest
  - runs all rawbody and rawbody-eval rules defined for that priority
level
  - checks to see if there are network rules to harvest
  - runs all full and full-eval rules defined for that priority level
  - checks to see if there are network rules to harvest
  - runs all meta rules defined for that priority level (note: if the
meta rules depend on a network rule, this may block until that rule
completes)
  - checks to see if there are network rules to harvest

  - calls the check_tick plugin hook

- finally, it waits for any remaining unharvested network rules (if it
  hasn't shortcircuited)

- calls the check_post_dnsbl plugin hook

- auto-learns from the message, if applicable

- calls the check_post_learn plugin hook

- and returns


In 3.2.x and 3.3.0 this is all in the Check plugin, in the check_main()
method, so can be redefined or overridden with alternative orderings
quite easily.

--j.

Loren Wilton writes:
   maybe if there was some way to establish a hierachy at startup
   which groups rule processing into nodes. some nodes finish
   quickly, some have dependencies, some are negative, etc.
 
  Just wanted to point out, this topic came out when site dns
  cache service started to fail due to excessive dnsbl queries. My
  slowdown was due to multiple timeouts and/or delay, probably
  related to answering joe-job rbldns backscatter -- that's the
  reason I was looking for early exit on scans in process.
 
 There is a little of splitting rules into processing speed groups done. 
 Specifically, the net-based tests, being dependent on external events for 
 completion, are split out from the other tests and are processed in two 
 phases.  The first phase issues the request for information over the net, 
 and the second phase then waits for an answer.  There is a background 
 routine that is harvesting incoming net results while other rules are 
 processed, so when a net result is required it may already be present and no 
 delay will be incurred.
 
 This is not an area I understand at all fully, but reading moderately recent 
 comments on Bugzilla leads me to believe that this is an area where some 
 improvement is still possible; there are some net tests that (I think) end 
 up waiting immediately for an answer rather than doing the two-phase 
 processing.  How much that slows down the result for the overall email 
 probably depends on many factors.
 
 Also note that even issuing the requests and then waiting for the result 
 only when it is needed doesn't guarantee that the mail will not have to wait 
 for results.  It could be that one of the very first rules processed (due to 
 priority ort meta dependency, for instance) will need a net result, and so 
 the entire rule process will be forced to wait on it.
 
 As far as splitting non-net rules up based on speed, that isn't very 
 practical.  Regex rules should in general be quite fast, and all of them are 
 going to require the use of the processor full-time anyway.  The speed of 
 the rule will depend on how it is written and the exact content of the email 
 it is processing.  So a rule that is dog slow on one email may be blindingly 
 fast on most other emails.  I don't know that there is any good way to 
 estimate the speed of a regex simply by looking at it.
 
  Loren


Re: Feeding SA-learn

2008-01-23 Thread Diego Pomatta

Anthony Peacock escribió:

Well the short answer is, yes you can.

The slightly longer answer is that you won't get as good results 
doing this, as the Bayes system uses tokens found in the complete 
message.  By only learning on the body you will not gain any 
advantage for tokens found in headers.





Yep, I know, precisely the problem is that I don't have the original 
headers after the mail has been delivered.
My intention was to manually feed the few spam messages that slip 
thru undetected. By the time I get a hold of those, they are in the 
recipient's mail client inbox, not in the server.
I was thinking, if I save the mail as EML files, would that preserve 
the headers in a way that sa-learn can parse correctly?



Depends on the client.

For instance, Thunderbird stores it's folders in mbox format, so 
sa-learn can work against those files as-is. Other email clients can 
save emails in text format complete with headers.
I use Thunderbird. There are two files for that folder: Junk.msf (7k) 
and Junk (53.172k). The msf file must be some kind of index. I just feed 
the biggest one to sa-learn?

/Regards


Re: Feeding SA-learn

2008-01-23 Thread Anthony Peacock

Diego Pomatta wrote:

Anthony Peacock escribió:

Well the short answer is, yes you can.

The slightly longer answer is that you won't get as good results 
doing this, as the Bayes system uses tokens found in the complete 
message.  By only learning on the body you will not gain any 
advantage for tokens found in headers.





Yep, I know, precisely the problem is that I don't have the original 
headers after the mail has been delivered.
My intention was to manually feed the few spam messages that slip 
thru undetected. By the time I get a hold of those, they are in the 
recipient's mail client inbox, not in the server.
I was thinking, if I save the mail as EML files, would that preserve 
the headers in a way that sa-learn can parse correctly?



Depends on the client.

For instance, Thunderbird stores it's folders in mbox format, so 
sa-learn can work against those files as-is. Other email clients can 
save emails in text format complete with headers.
I use Thunderbird. There are two files for that folder: Junk.msf (7k) 
and Junk (53.172k). The msf file must be some kind of index. I just feed 
the biggest one to sa-learn?


Yes, the .msf file is an index file.  I just copy the mbox file (Junk in 
your case) to the server and run the following command specifying the 
filename (as shown):


/usr/local/bin/spamassassin --report --mbox Junk



--
Anthony Peacock
CHIME, Royal Free  University College Medical School
WWW:http://www.chime.ucl.ac.uk/~rmhiajp/
A CAT scan should take less time than a PET scan.  For a CAT scan,
 they're only looking for one thing, whereas a PET scan could result in
 a lot of things.- Carl Princi, 2002/07/19


Re: Spamd and MySQL userprefs/ AWL/ Bayes

2008-01-23 Thread Rubin Bennett

On Tue, 2008-01-22 at 12:49 -0600, Michael Parker wrote:
 On Jan 22, 2008, at 12:17 PM, Rubin Bennett wrote:
 
 
  On Tue, 2008-01-22 at 10:45 -0600, Michael Parker wrote:
  On Jan 22, 2008, at 10:12 AM, Rubin Bennett wrote:
 
  WTF am I doing wrong?!
 
  Not including debug logs in your message.
 
  User prefs does not work with spamassassin, so you won't see anything
  there, but you should be seeing something for Bayes SQL and AWL SQL  
  if
  they are configured correctly.
 
  What do you mean?!  Isn't that what the user_scores_dsn is all about?!
 
 
 The spamassassin script.  User prefs only works when you run via  
 spamd.  But lets look at the debug output:
 
 
  [31490] dbg: bayes: using username: root
  [31490] dbg: bayes: database connection established
  [31490] dbg: bayes: found bayes db version 3
  [31490] dbg: bayes: Using userid: 1
 
 Ok, this tells me that Bayes SQL looks to be running just fine.  If  
 you read sql/README.bayes it tells you what to look for to test if  
 things are working correctly.
 
 
  [31490] dbg: bayes: corpus size: nspam = 2106, nham = 19051
  [31490] dbg: bayes: tok_get_all: token count: 20
  [31490] dbg: bayes: score = 0.472224419305046
  [31490] dbg: bayes: DB expiry: tokens in DB: 133258, Expiry max size:
  15, Oldest atime: 1193647841, Newest atime: 1201025739, Last  
  expire:
  1195029791, Current time: 1201025739
 
 It even looks like you've got some data in there.
 
You're right, it does appear to be connecting to the database for bayes.

Spamd output below:[EMAIL PROTECTED] ~]# spamd -q -D
[12373] dbg: logger: adding facilities: all
[12373] dbg: logger: logging level is DBG
[12373] dbg: logger: trying to connect to syslog/unix...
[12373] dbg: logger: opening syslog with unix socket
[12373] dbg: logger: successfully connected to syslog/unix
[12373] dbg: logger: successfully added syslog method
[12373] dbg: spamd: will perform setuids? 1
[12373] dbg: spamd: creating INET socket:
[12373] dbg: spamd: Listen: 128
[12373] dbg: spamd: LocalAddr: 127.0.0.1
[12373] dbg: spamd: LocalPort: 783
[12373] dbg: spamd: Proto: 6
[12373] dbg: spamd: ReuseAddr: 1
[12373] dbg: spamd: Type: 1
[12373] dbg: logger: adding facilities: all
[12373] dbg: logger: logging level is DBG
[12373] dbg: generic: SpamAssassin version 3.2.3
[12373] dbg: config: score set 0 chosen.
[12373] dbg: dns: is Net::DNS::Resolver available? yes
[12373] dbg: dns: Net::DNS version: 0.61
[12373] dbg: learn: initializing learner
[12373] dbg: config: using /etc/mail/spamassassin for site rules pre
files
[12373] dbg: config: read file /etc/mail/spamassassin/init.pre
[12373] dbg: config: read file /etc/mail/spamassassin/v310.pre
[12373] dbg: config: read file /etc/mail/spamassassin/v312.pre
[12373] dbg: config: read file /etc/mail/spamassassin/v320.pre
[12373] dbg: config: using /var/lib/spamassassin/3.002003 for sys
rules pre files
[12373] dbg: config: using /var/lib/spamassassin/3.002003 for default
rules dir
[12373] dbg: config: read
file /var/lib/spamassassin/3.002003/updates_spamassassin_org.cf
[12373] dbg: config: using /etc/mail/spamassassin for site rules dir
[12373] dbg: config: read file /etc/mail/spamassassin/70_sare_adult.cf
[12373] dbg: config: read
file /etc/mail/spamassassin/70_sare_bayes_poison_nxm.cf
[12373] dbg: config: read
file /etc/mail/spamassassin/70_sare_evilnum0.cf
[12373] dbg: config: read
file /etc/mail/spamassassin/70_sare_evilnum1.cf
[12373] dbg: config: read
file /etc/mail/spamassassin/70_sare_evilnum2.cf
[12373] dbg: config: read
file /etc/mail/spamassassin/70_sare_genlsubj.cf
[12373] dbg: config: read
file /etc/mail/spamassassin/70_sare_genlsubj_eng.cf
[12373] dbg: config: read
file /etc/mail/spamassassin/70_sare_genlsubj_x30.cf
[12373] dbg: config: read file /etc/mail/spamassassin/70_sare_header.cf
[12373] dbg: config: read file /etc/mail/spamassassin/70_sare_header0.cf
[12373] dbg: config: read
file /etc/mail/spamassassin/70_sare_header_eng.cf
[12373] dbg: config: read
file /etc/mail/spamassassin/70_sare_header_x264_x30.cf
[12373] dbg: config: read
file /etc/mail/spamassassin/70_sare_header_x30.cf
[12373] dbg: config: read
file /etc/mail/spamassassin/70_sare_highrisk.cf
[12373] dbg: config: read file /etc/mail/spamassassin/70_sare_html.cf
[12373] dbg: config: read
file /etc/mail/spamassassin/70_sare_html_eng.cf
[12373] dbg: config: read
file /etc/mail/spamassassin/70_sare_html_x30.cf
[12373] dbg: config: read file /etc/mail/spamassassin/70_sare_obfu.cf
[12373] dbg: config: read file /etc/mail/spamassassin/70_sare_oem.cf
[12373] dbg: config: read file /etc/mail/spamassassin/70_sare_random.cf
[12373] dbg: config: read
file /etc/mail/spamassassin/70_sare_specific.cf
[12373] dbg: config: read file /etc/mail/spamassassin/70_sare_spoof.cf
[12373] dbg: config: read file /etc/mail/spamassassin/70_sare_stocks.cf
[12373] dbg: config: read file /etc/mail/spamassassin/70_sare_unsub.cf
[12373] dbg: config: read file /etc/mail/spamassassin/70_sare_uri0.cf
[12373] dbg: config: read 

Re: Feeding SA-learn

2008-01-23 Thread Mark Johnson

Depends on the client.

For instance, Thunderbird stores it's folders in mbox format, so 
sa-learn can work against those files as-is. Other email clients can 
save emails in text format complete with headers.
I use Thunderbird. There are two files for that folder: Junk.msf (7k) 
and Junk (53.172k). The msf file must be some kind of index. I just 
feed the biggest one to sa-learn?


Yes, the .msf file is an index file.  I just copy the mbox file (Junk in 
your case) to the server and run the following command specifying the 
filename (as shown):


/usr/local/bin/spamassassin --report --mbox Junk



I use Thunderbird as my mail client but have found that I needed to use 
Evolution to save the messages in mbox format, which was always a hassle.


My emails are stored on an IMAP server and what you suggested wasn't 
working for me.  I had the .msf file, but no corresponding mbox file. 
Because the emails are kept on the IMAP server and are not local, I had 
to enable the Select this folder for offline use on the Offline tab 
of the folder properties.  I then had the mbox file that I could copy off.


--
Mark Johnson
http://www.astroshapes.com/information-technology/blog/



Re: Feeding SA-learn

2008-01-23 Thread Anthony Peacock

Mark Johnson wrote:

Depends on the client.

For instance, Thunderbird stores it's folders in mbox format, so 
sa-learn can work against those files as-is. Other email clients can 
save emails in text format complete with headers.
I use Thunderbird. There are two files for that folder: Junk.msf (7k) 
and Junk (53.172k). The msf file must be some kind of index. I just 
feed the biggest one to sa-learn?


Yes, the .msf file is an index file.  I just copy the mbox file (Junk 
in your case) to the server and run the following command specifying 
the filename (as shown):


/usr/local/bin/spamassassin --report --mbox Junk



I use Thunderbird as my mail client but have found that I needed to use 
Evolution to save the messages in mbox format, which was always a hassle.


My emails are stored on an IMAP server and what you suggested wasn't 
working for me.  I had the .msf file, but no corresponding mbox file. 
Because the emails are kept on the IMAP server and are not local, I had 
to enable the Select this folder for offline use on the Offline tab 
of the folder properties.  I then had the mbox file that I could copy off.


Good point, I use this on folders that are saved on the local hard disk.

--
Anthony Peacock
CHIME, Royal Free  University College Medical School
WWW:http://www.chime.ucl.ac.uk/~rmhiajp/
A CAT scan should take less time than a PET scan.  For a CAT scan,
 they're only looking for one thing, whereas a PET scan could result in
 a lot of things.- Carl Princi, 2002/07/19


Re: Spamd and MySQL userprefs/ AWL/ Bayes

2008-01-23 Thread Michael Parker


On Jan 23, 2008, at 6:37 AM, Rubin Bennett wrote:



Spamd output below:[EMAIL PROTECTED] ~]# spamd -q -D
[12373] dbg: logger: adding facilities: all
[12373] dbg: logger: logging level is DBG


Can you run this again and this time pass 1-2 msgs through just like  
you would normally, instead of just the default prime-the-pump  
message.  Also, please remind me of you spamd startup options and  
maybe even attach your local.cf (or where ever you're adding the sql  
config items) file for good measure.


Michael


Feedback on 3.2.4

2008-01-23 Thread Skip
Other than the initial reports of performance boost from 3.2.4, I haven't
seen much discussion on it as yet.  Perhaps it is still too soon to know,
but has anyone been seeing other benefits - or identified potential
problems?

- Skip



RE: more efficent big scoring

2008-01-23 Thread Robert - elists

 
 Just wanted to point out, this topic came out when site dns
 cache service started to fail due to excessive dnsbl queries. My
 slowdown was due to multiple timeouts and/or delay, probably
 related to answering joe-job rbldns backscatter -- that's the
 reason I was looking for early exit on scans in process.
 
 // George
 George Georgalis, information system scientist IXOYE

George, That is correct!

I still maintain that the SA Team is more than bright and talented enough,
that over time, they will come up with new algorithms to allow this behavior
w/o a substantial SA processing speed decrease.

I just cannot imagine that the theoretical and real world limits of these
functions have been met yet.

And bottom line is, even if a function initially slows down the processing,
if the theory behind it is the truth, shouldn't it be pursued until it can
be implemented properly?

... or is that just wishful thinking?

jdow has suggested some reading to enlighten me that I will be getting to in
short order.

 - rh



RE: whois plugin .. where to get it

2008-01-23 Thread John D. Hardin
On Wed, 23 Jan 2008, ram wrote:

  Allegedly 100% spam. Innocent until proven guilty, ect. 
  
  NUCLEAR NAMES, INC. 
 
 I would love to block all domains with these , but to think of it what
 is there to prevent them from getting themselves whitelisted by
 registering good domains

There's a lot of difference between not blacklisted and 
whitelisted. Not using a spam-friendly registrar does not mean they 
will get a pass, just that they won't get points for haiving used a 
spam-friendly registrar.
 
 They can register one more domain with an innocent website (say a
 wiki news site)  etc Now they are less than 100% spammer
 registrars

Oh, I see what you mean.

Adding some score for spam-friendly (not necessarily 100% spammy) is
reasonable; using the registrar as a poison pill is not. Plus, what
legitimate domain owner would wish to knowingly register their domain
with a registrar that has a truly bad reputation? Plus, they can leave
that registrar rather easily.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  If Microsoft made hammers, everyone would whine about how poorly
  screws were designed and about how they are hard to hammer in, and
  wonder why it takes so long to paint a wall using the hammer.
---
 4 days until the 41st anniversary of the loss of Apollo 1



Re: Feedback on 3.2.4

2008-01-23 Thread Rick Macdougall

Skip wrote:

Other than the initial reports of performance boost from 3.2.4, I haven't
seen much discussion on it as yet.  Perhaps it is still too soon to know,
but has anyone been seeing other benefits - or identified potential
problems?



No problems with it at all here (around 7 servers upgraded) and the 
performance is greatly increased.  I went from a 1.4 second average scan 
time to 0.6 seconds average.


HTH,

Rick



Re: Spamd and MySQL userprefs/ AWL/ Bayes

2008-01-23 Thread Rubin Bennett
Here you go, and thanks!
Output of spamd -q -D, left running for a while.

Well... it reveals that it is in fact pulling my userprefs from SQL, but
it was ignoring the ones with the $GLOBAL username.  Apparently, it now
requires the @GLOBAL username instead (which IIRC it didn't at some
point in the past).

So... it is working, and I thank you all for your input.

Lesson learned: for MySQL configs, use:
spamd -q -x -d
Make sure your GLOBAL config in mysql is set for the @GLOBAL user.

Rubin

On Wed, 2008-01-23 at 10:35 -0600, Michael Parker wrote:
 On Jan 23, 2008, at 6:37 AM, Rubin Bennett wrote:
 
 
  Spamd output below:[EMAIL PROTECTED] ~]# spamd -q -D
  [12373] dbg: logger: adding facilities: all
  [12373] dbg: logger: logging level is DBG
 
 Can you run this again and this time pass 1-2 msgs through just like  
 you would normally, instead of just the default prime-the-pump  
 message.  Also, please remind me of you spamd startup options and  
 maybe even attach your local.cf (or where ever you're adding the sql  
 config items) file for good measure.
 
 Michael



Expiry problem

2008-01-23 Thread Steven Stern
We had a server go crazy last night and reset its date into August of 
2277.  In any case, we've resolved that, but now I can't get bayes to 
expire.


After the clocks was correctly set, I deleted all tokens that had a 
lastupdate in the future, and also removed similar bayes_seen rows.  I 
then reset the the token count in bayes_vars to the correct value.


When I try to run sa-learn --force-expire, nothing gets expired and the 
token list keeps growing.  Will this get better on its own or do I need 
to intervene?


[14256] dbg: bayes: using username: root
[14256] dbg: bayes: database connection established
[14256] dbg: bayes: found bayes db version 3
[14256] dbg: bayes: Using userid: 1
[14256] dbg: config: score set 3 chosen.
[14256] dbg: learn: initializing learner
[14256] dbg: bayes: bayes journal sync starting
[14256] dbg: bayes: bayes journal sync completed
[14256] dbg: bayes: expiry starting
[14256] dbg: bayes: expiry check keep size, 0.75 * max: 112500
[14256] dbg: bayes: token count: 443162, final goal reduction size: 330662
[14256] dbg: bayes: first pass? current: 1201117198, Last: 1201117194, 
atime: 43200, count: 1231, newdelta: 160, ratio: 268.612510154346, 
period: 43200
[14256] dbg: bayes: can't use estimation method for expiry, unexpected 
result, calculating optimal atime delta (first pass)

[14256] dbg: bayes: expiry max exponent: 9
[14256] dbg: bayes: atime token reduction
[14256] dbg: bayes:  ===
[14256] dbg: bayes: 43200 528
[14256] dbg: bayes: 86400 0
[14256] dbg: bayes: 172800 0
[14256] dbg: bayes: 345600 0
[14256] dbg: bayes: 691200 0
[14256] dbg: bayes: 1382400 0
[14256] dbg: bayes: 2764800 0
[14256] dbg: bayes: 5529600 0
[14256] dbg: bayes: 11059200 0
[14256] dbg: bayes: 22118400 0
[14256] dbg: bayes: couldn't find a good delta atime, need more token 
difference, skipping expire

[14256] dbg: bayes: expiry completed


sa-learn errors.

2008-01-23 Thread Michael Hutchinson
Hi all. 

 

I have been having issues with SA for a while, most of my requests for
help going unheard. I've managed to upgrade SA and fix most of the
errors, but am still getting a couple that I've not been able to fix
yet. Can someone please help with this ?

 

We are running Debian Sarge with SA 3.1.7 from backports.

Whenever I try to add new rules with sa-learn from our missed spam
folder, I get these errors. And, since I've tried to run them, I also
get these errors whenever I manually run spamd or Spamassassin:

 

mailserver:~# sa-learn --spam
/home/vpopmail/domains/mail.ourdomain.net/spam/Maildir/.Missed\
Spam.20080124/cur/

Bareword MAX_URI_LENGTH not allowed while strict subs in use at
/usr/share/perl5/Mail/SpamAssassin/PerMsgStatus.pm line 2010.

Bareword MAX_URI_LENGTH not allowed while strict subs in use at
/usr/share/perl5/Mail/SpamAssassin/PerMsgStatus.pm line 2012.

Compilation failed in require at /usr/share/perl5/Mail/SpamAssassin.pm
line 72.

BEGIN failed--compilation aborted at
/usr/share/perl5/Mail/SpamAssassin.pm line 72.

Compilation failed in require at /usr/bin/sa-learn line 78.

BEGIN failed--compilation aborted at /usr/bin/sa-learn line 78.

 

I am used to getting similar sa-learn errors, but not ones that cause
problems when spamd or Spamassassin is manually run. Can anyone please
define what strict subs is used for and if I should disable it to
allow MAX_URI_LENGTH to work properly ?

 

Cheers,

Michael Hutchinson

http://www.manux.co.nz  

 



Re: sa-learn errors.

2008-01-23 Thread John D. Hardin
On Thu, 24 Jan 2008, Michael Hutchinson wrote:

 Bareword MAX_URI_LENGTH not allowed while strict subs in use
 at /usr/share/perl5/Mail/SpamAssassin/PerMsgStatus.pm line 2010.
 
 Bareword MAX_URI_LENGTH not allowed while strict subs in use
 at /usr/share/perl5/Mail/SpamAssassin/PerMsgStatus.pm line 2012.
 
 Compilation failed in require at
 /usr/share/perl5/Mail/SpamAssassin.pm line 72.

Those are compile errors in the core SA code. Your install appears to
be corrupted. 

Has anyone been editing the files under /usr/share/perl5/Mail/ ?

You will probably need to wipe and reinstall SA from scratch. Note 
that your local rules and bayes database shouldn't be affected by 
doing this.
 
 I am used to getting similar sa-learn errors, but not ones that
 cause problems when spamd or Spamassassin is manually run.

You may have two different copies of SA installed, and one is bad. 
This can happen if you install SA from a distro package and then later 
attempt to install or upgrade from CPAN (or vice versa).

 Can anyone please define what strict subs is used for and if I
 should disable it to allow MAX_URI_LENGTH to work properly ?

Those are Perl language options; you shouldn't be fiddling around with 
that stuff unless you're an SA developer or you want to modify SA 
itself (as opposed to just creating rules or doing other common 
administrative tasks).

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  We are hell-bent and determined to allocate the talent, the
  resources, the money, the innovation to absolutely become a
  powerhouse in the ad business.   -- Microsoft CEO Steve Ballmer
  ...because allocating talent to securing Windows isn't profitable?
---
 4 days until Wolfgang Amadeus Mozart's 252nd Birthday



RE: sa-learn errors.

2008-01-23 Thread Michael Hutchinson

John wrote:

 Those are compile errors in the core SA code. Your install appears to
 be corrupted.
 
 Has anyone been editing the files under /usr/share/perl5/Mail/ ?
 
 You will probably need to wipe and reinstall SA from scratch. Note
 that your local rules and bayes database shouldn't be affected by
 doing this.

Thanks for the reply, John. I believe there is a problem with more than
one perl5/Mail dir hanging around on the system, which I will be
addressing shortly - not that this should be an issue, SA is configured
to read one config directory, not two..

 You may have two different copies of SA installed, and one is bad.
 This can happen if you install SA from a distro package and then later
 attempt to install or upgrade from CPAN (or vice versa).

I'd say you've hit the nail on the head. I recently did an upgrade from
dpkg -i at the previous admin's recommendation. Methinks the method
used to install the original SA was different.. probably compiled,
though, instead of CPAN.
 
 Those are Perl language options; you shouldn't be fiddling around with
 that stuff unless you're an SA developer or you want to modify SA
 itself (as opposed to just creating rules or doing other common
 administrative tasks).

Ok. That I can understand. Thank-you very much for your response, John.
I now have a plan for the weekend to knock out SA completely and
re-install it after backup of Config and Bayes data.

Thanks again!
Cheers,
Michael.



Re: Feedback on 3.2.4

2008-01-23 Thread Jorge Valdes

Rick Macdougall wrote:

Skip wrote:
Other than the initial reports of performance boost from 3.2.4, I 
haven't
seen much discussion on it as yet.  Perhaps it is still too soon to 
know,

but has anyone been seeing other benefits - or identified potential
problems?



No problems with it at all here (around 7 servers upgraded) and the 
performance is greatly increased.  I went from a 1.4 second average 
scan time to 0.6 seconds average.


HTH,

Rick



Is this without network tests?
Because on my server I had

Begin   : 2008-01-01
End : 2008-01-15
Summary : 3.1.8

 Cnt%% Average  MinMax
-- -- -- -- --
18968  46.2%  7.837  1.861 10.000
16640  40.6% 13.654 10.001 19.999
 2916   7.1% 23.892 20.003 30.000
 1379   3.4% 38.132 30.002 59.882
  184   0.4% 74.994 60.041 89.753
   37   0.1% 99.552 90.282118.884
  904   2.2%154.578120.272364.923

Begin   : 2008-01-21
End : 2008-01-24
Summary : version 3.2.4

 Cnt%% Average  MinMax
-- -- -- -- --
 5302  44.9%  7.431  3.872 10.000
 4737  40.1% 13.643 10.002 19.998
  869   7.4% 24.003 20.008 29.982
  555   4.7% 41.017 30.001 59.947
  126   1.1% 72.529 60.201 89.941
   24   0.2%101.170 90.641118.022
  201   1.7%154.700120.454188.119

Because by just the percentages scantime is roughly the same with 
exactly the same hardware.



--
Jorge Valdes




Re: Feedback on 3.2.4

2008-01-23 Thread Rick Macdougall

Jorge Valdes wrote:


No problems with it at all here (around 7 servers upgraded) and the 
performance is greatly increased.  I went from a 1.4 second average 
scan time to 0.6 seconds average.


HTH,

Rick



Is this without network tests?
Because on my server I had

Begin   : 2008-01-01
End : 2008-01-15
Summary : 3.1.8

 Cnt%% Average  MinMax
-- -- -- -- --
18968  46.2%  7.837  1.861 10.000
16640  40.6% 13.654 10.001 19.999
 2916   7.1% 23.892 20.003 30.000
 1379   3.4% 38.132 30.002 59.882
  184   0.4% 74.994 60.041 89.753
   37   0.1% 99.552 90.282118.884
  904   2.2%154.578120.272364.923

Begin   : 2008-01-21
End : 2008-01-24
Summary : version 3.2.4

 Cnt%% Average  MinMax
-- -- -- -- --
 5302  44.9%  7.431  3.872 10.000
 4737  40.1% 13.643 10.002 19.998
  869   7.4% 24.003 20.008 29.982
  555   4.7% 41.017 30.001 59.947
  126   1.1% 72.529 60.201 89.941
   24   0.2%101.170 90.641118.022
  201   1.7%154.700120.454188.119

Because by just the percentages scantime is roughly the same with 
exactly the same hardware.





Yup, full scanning including network tests and bayes stored in a network 
MySQL server.


Hardware is Dell 860s (I believe, could be 850) with 4 gigs of ram and 
no second CPU installed.


Regards,

Rick


Re: Feeding SA-learn

2008-01-23 Thread John Thompson
On 2008-01-23, Anthony Peacock [EMAIL PROTECTED] wrote:

 My intention was to manually feed the few spam messages that slip thru 
 undetected. By the time I get a hold of those, they are in the 
 recipient's mail client inbox, not in the server.
 I was thinking, if I save the mail as EML files, would that preserve the 
 headers in a way that sa-learn can parse correctly?

 Depends on the client.

 For instance, Thunderbird stores it's folders in mbox format, so 
 sa-learn can work against those files as-is.  Other email clients can 
 save emails in text format complete with headers.

 The biggest problem with this is training the users to do that consistantly.

Isn't that what cron is for? :-)

I have a cron job on my imap server to regularly feed ham and spam 
through sa-learn.

-- 

John ([EMAIL PROTECTED])



RE: whois plugin .. where to get it

2008-01-23 Thread Giampaolo Tomassoni
 -Original Message-
 From: ram [mailto:[EMAIL PROTECTED]
 Sent: Monday, January 21, 2008 2:36 PM
 
 ...omissis...
  Again, no registrar check, sorry. You could eventually use the:
 uri_whois
  nsname or the uri_whois nsaddr tests to attempt catch these.
 
 I think I am missing something here. The NS address is different from
 the registrar.

Right, it is.

The URIWhois does not detect the registrar. It detects the name and the
address of the DNS- and whois-defined NSes for that domain.


 How can we score based on NS address? Can a spammer not put innocent
 servers as his Nameserver , as long as they allow DNS queries to his
 host

I guess there is a not in excess in your question. If you are asking can
a spammer put an innocent etc, etc, then the answer is: yes, he/she can,
but this wouldn't help with the URIWhois plugin.

What I meant in my scarce previous reply was that the URIWhois plugin builds
a list of the nameservers' names and addresses related to the uris SA finds
in the message. These names and addresses are built merging the ones
discovered through DNS queries with the ones discovered through whois
queries. Then, your rules may attempt to match DNS server names or addresses
from this list. You of course would attempt matching the list with names and
addresses which are well-known to be bad, not with any innocent one.

There are many usage cases for the URIWhois plugin. However, basically
spammers as well as everybody else often like to have full control over the
authoritative NSes of their domains. Thereby, you may easily protected
yourself even from future spam by creating a URIWhois rule matching the
bunch of addresses in which the whois-published nameservers are: it is often
assigned to a single entity which I bet is less than innocent.

In example, whois says beekeenidotcom is handled by 210.14.128.172 and
210.14.128.112. These NSes are the 210.14.128.0/13 address space which is
assigned to a Chinese company. When I attempt to access its site I even get
a warning from my antivirus. Ok, I decide they are basically spammers. Then
I put this rule in my URIWhois.cf:

uri_whois SPAMDNSADDR nsaddr in 210.14.128.172/13
score SCAMDNSADDR {TheScoreILike}

and that's it: sites whose DNS authoritative servers are in these addresses
get a score, regardless of the NS name and any attempt to DNS redirection by
them...

Of course, I can also group together more bunch of addresses:

uri_whois SPAMDNSADDR nsaddr in 210.14.128.172/13 1.1.1.0/24
2.2.2.0/24 etc, etc

Thereby, I don't need a SA rule for every and each bunch as long as I want
to score them the same.

Anyway, apart from matching the DNS names and addresses published through
whois records, I recently discovered that most spam domains don't respond to
SOA and NS request!!! This is quite easy to detect with very-low cpu
consumption (i.e., asynchronously), at the cost that the spam has to wait
for at least a couple of DNS request timeouts before asserting that SOA
and/or NS replies are missing. By the way, RFC 1035 states that SOA and NS
requests MUST be replied by an authoritative nameserver.

Try this:

dig soa beekeenidotcom

...

Unfortunately this detection is not yet implemented in the URIWhois
RFC1035IGN rule.


 The format of the registrar in whois information is not standardized. I
 wonder why.  If I could do something like
 dig domain.tld REG ( just like dig domain.tld MX )
 then life would have been so simple.

The main problem here is that, when ICANN delegates control about gTLD
zones, there is not even a word in its agreements regarding public access
to registration records. gTLDs' Network Information Centers are basically
free to do whatever they like with their gTLD, provided they implement
something to let ICANN inspect their records. ICANN, not you or me...


 Thanks
 Ram

You welcome,

Giampaolo



Re: Feeding SA-learn

2008-01-23 Thread John Thompson
On 2008-01-23, Diego Pomatta [EMAIL PROTECTED] wrote:

 I use Thunderbird. There are two files for that folder: Junk.msf (7k) 
 and Junk (53.172k). The msf file must be some kind of index. I just feed 
 the biggest one to sa-learn?

Yup. Use sa-learn --spam --mbox Junk to learn your spam. You'll want 
to use the --mbox switch so sa-learn will process it as an mbox format 
mailbox, since that's what Thunderbird uses to store mail.

-- 

John ([EMAIL PROTECTED])



Re: Feeding SA-learn

2008-01-23 Thread John Thompson
On 2008-01-23, Mark Johnson [EMAIL PROTECTED] wrote:

 My emails are stored on an IMAP server and what you suggested wasn't 
 I use Thunderbird as my mail client but have found that I needed to use 
 Evolution to save the messages in mbox format, which was always a hassle.

mbox is already the format in which Thunderbird stores mail. What was 
the problem that caused you to use Evolution?

 Because the emails are kept on the IMAP server and are not local, I had 
 to enable the Select this folder for offline use on the Offline tab 
 of the folder properties.  I then had the mbox file that I could copy off.

If you have shell access to the machine running the imap server, use a 
cron job on the server to feed your Junk into spamassassin. 

-- 

John ([EMAIL PROTECTED])



Re: Feeding SA-learn

2008-01-23 Thread Mark Johnson

John Thompson wrote:


Isn't that what cron is for? :-)

I have a cron job on my imap server to regularly feed ham and spam 
through sa-learn.




Do you delete the messages from the IMAP folder after you learn them? 
If so, how do you go about that?  I'm pretty sure if I deleted the mail 
files from the command line, I have to run a reconstruct on the mailbox 
or the folder throws errors on the client.  This is on a Cyrus IMAP server.


Thanks!

--
Mark Johnson
http://www.astroshapes.com/information-technology/blog/


Re: Expiry problem

2008-01-23 Thread Matt Kettler

Steven Stern wrote:
We had a server go crazy last night and reset its date into August of 
2277.  In any case, we've resolved that, but now I can't get bayes to 
expire.


After the clocks was correctly set, I deleted all tokens that had a 
lastupdate in the future, and also removed similar bayes_seen rows.  I 
then reset the the token count in bayes_vars to the correct value.

d
When I try to run sa-learn --force-expire, nothing gets expired and 
the token list keeps growing.  Will this get better on its own or do I 
need to intervene?

You might need to ditch your bayes database.

The database will, over time, partially fix itself, but right now any 
one off tokens learned while the date was off are stuck in your bayes 
DB until 2277. SA's expiry method is based on the age of a token, 
based on when it was last accessed. That method has absolutely no way to 
deal with atimes that are in the future, so it will never try to expire 
those tokens.


It can partially fix itself, because every time a token gets accessed, 
its atime gets updated. So as the more common tokens get used, they'll 
start rotating out as they would normally. However, any unique tokens 
are stuck there.


If you're *really* desperate to preserve the bayes DB, you could wait a 
couple days, do a sa-learn --backup, use grep to remove all the lines 
with absurd atimes, then use sa-learn --restore. That's a good bit of 
work to go through...


If you decide to go this route:  For reference, and assuming my 
scratchpad math is right, the atimes for 2277 should be around 9.6 
billion, while the ones for 2008 should be around 1.2 billion. Of 
course, that's assuming the atimes are stored 64 bit and aren't wrapping 
as 32 bit numbers.. However, if that were the case, they'd be wrapping 
to 2004, and your expire numbers should show really high token 
eliminations, not really low..
















Re: Expiry problem

2008-01-23 Thread Steven Stern

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/23/2008 07:35 PM, Matt Kettler wrote:
| Steven Stern wrote:
| We had a server go crazy last night and reset its date into August of
| 2277.  In any case, we've resolved that, but now I can't get bayes to
| expire.
|
| After the clocks was correctly set, I deleted all tokens that had a
| lastupdate in the future, and also removed similar bayes_seen rows.  I
| then reset the the token count in bayes_vars to the correct value.
| d
| When I try to run sa-learn --force-expire, nothing gets expired and
| the token list keeps growing.  Will this get better on its own or do I
| need to intervene?
| You might need to ditch your bayes database.
|
| The database will, over time, partially fix itself, but right now any
| one off tokens learned while the date was off are stuck in your bayes
| DB until 2277. SA's expiry method is based on the age of a token,
| based on when it was last accessed. That method has absolutely no way to
| deal with atimes that are in the future, so it will never try to expire
| those tokens.
|
| It can partially fix itself, because every time a token gets accessed,
| its atime gets updated. So as the more common tokens get used, they'll
| start rotating out as they would normally. However, any unique tokens
| are stuck there.
|
| If you're *really* desperate to preserve the bayes DB, you could wait a
| couple days, do a sa-learn --backup, use grep to remove all the lines
| with absurd atimes, then use sa-learn --restore. That's a good bit of
| work to go through...
|
| If you decide to go this route:  For reference, and assuming my
| scratchpad math is right, the atimes for 2277 should be around 9.6
| billion, while the ones for 2008 should be around 1.2 billion. Of
| course, that's assuming the atimes are stored 64 bit and aren't wrapping
| as 32 bit numbers.. However, if that were the case, they'd be wrapping
| to 2004, and your expire numbers should show really high token
| eliminations, not really low..
|

It's finally started to remove tokens, so I think I'm OK. We use SQL
bayes, so it was an easy matter to use

~  delete from bayes_token where atime  UNIX_TIMESTAMP();

to clean up the stuff from the future.


- --

~  Steve
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHmAwSeERILVgMyvARAmkBAJ4od1lX/wXYdadek1deySDYZi4SQgCfcskW
dOHVuSkn5UeKZUGYJjA6J2A=
=c5W9
-END PGP SIGNATURE-


Re: whois plugin .. where to get it

2008-01-23 Thread Matt Kettler

Giampaolo Tomassoni wrote:


Right, it is.

The URIWhois does not detect the registrar. It detects the name and the
address of the DNS- and whois-defined NSes for that domain.
  


So how is this substantially different from the URIDNSBL plugin that 
comes with SA?


Bear in mind this plugin *DOES* resolve the NSes for the domain, and 
DOES check those too. Take for example URIBL_SBL, which only makes sense 
in the context of the IP of the nameservers (since it's an IP based 
RBL). I guess you could say that looking up the IP of the host in the 
URL would also work, but that's an invitation for DoS, so it's not 
something URIDNSBL does.


The only big difference I see at face value is it uses whois instead of 
DNS to find the NS records.. that hardly seems efficient..










Re: whois plugin .. where to get it

2008-01-23 Thread Jeff Chan

Quoting Matt Kettler [EMAIL PROTECTED]:


The only big difference I see at face value is it uses whois instead of
DNS to find the NS records.. that hardly seems efficient..


Whois is definitely the wrong protocol to use for automated testing,  
especially for any high volumes.  It was not designed or intended for  
that purpose, which is arguably abusive.


Jeff C.