Re: Ruleset load order dependencies
On 22.01.08 17:34, byrnejb wrote: No. The problem is that you don't have the modules loaded which would let the rules get defined. The meta dependencies are checked after everything has loaded. -- How do I ensure that the proper modules are loaded and what are they called? please, learn to quote and don't indent original text by -- string - that is a signature separator. uncomment razor, pyzor and DCC modules. Note that they all need external programs to work and DCC requires own DCC server for mailservers with 100k messages per day -- Matus UHLAR - fantomas, [EMAIL PROTECTED] ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. 42.7 percent of all statistics are made up on the spot.
Re: Feeding SA-learn
Anthony Peacock escribió: Can I feed a plain text file representing just the body of a message to sa-learn? /Diego Yes you can, who to stop it? I just sent your message body as --ham, and it told it learned one message. I meant without the headers, just the body. ok thanks Well the short answer is, yes you can. The slightly longer answer is that you won't get as good results doing this, as the Bayes system uses tokens found in the complete message. By only learning on the body you will not gain any advantage for tokens found in headers. Yep, I know, precisely the problem is that I don't have the original headers after the mail has been delivered. My intention was to manually feed the few spam messages that slip thru undetected. By the time I get a hold of those, they are in the recipient's mail client inbox, not in the server. I was thinking, if I save the mail as EML files, would that preserve the headers in a way that sa-learn can parse correctly? Thanks /Diego
Re: Feeding SA-learn
Diego Pomatta wrote: Anthony Peacock escribió: Can I feed a plain text file representing just the body of a message to sa-learn? /Diego Yes you can, who to stop it? I just sent your message body as --ham, and it told it learned one message. I meant without the headers, just the body. ok thanks Well the short answer is, yes you can. The slightly longer answer is that you won't get as good results doing this, as the Bayes system uses tokens found in the complete message. By only learning on the body you will not gain any advantage for tokens found in headers. Yep, I know, precisely the problem is that I don't have the original headers after the mail has been delivered. My intention was to manually feed the few spam messages that slip thru undetected. By the time I get a hold of those, they are in the recipient's mail client inbox, not in the server. I was thinking, if I save the mail as EML files, would that preserve the headers in a way that sa-learn can parse correctly? Depends on the client. For instance, Thunderbird stores it's folders in mbox format, so sa-learn can work against those files as-is. Other email clients can save emails in text format complete with headers. The biggest problem with this is training the users to do that consistantly. -- Anthony Peacock CHIME, Royal Free University College Medical School WWW:http://www.chime.ucl.ac.uk/~rmhiajp/ A CAT scan should take less time than a PET scan. For a CAT scan, they're only looking for one thing, whereas a PET scan could result in a lot of things.- Carl Princi, 2002/07/19
Re: more efficent big scoring
To clarify -- here's how the current code orders rule evaluation: - message metadata is extracted. - header DNSBL tests are started. - the decoded forms of the body text are extracted and cached. - the URIs in the message body are extracted and cached. - Iterates through each known priority level, defined in the active ruleset, from lowest to highest, and: - checks to see if it's shortcircuited; if it has, it breaks the loop - calls the 'check_rules_at_priority' plugin hook - runs all head and head-eval rules defined for that priority level - checks to see if there are network rules to harvest - runs all body and body-eval rules defined for that priority level - checks to see if there are network rules to harvest - runs all uri rules defined for that priority level - checks to see if there are network rules to harvest - runs all rawbody and rawbody-eval rules defined for that priority level - checks to see if there are network rules to harvest - runs all full and full-eval rules defined for that priority level - checks to see if there are network rules to harvest - runs all meta rules defined for that priority level (note: if the meta rules depend on a network rule, this may block until that rule completes) - checks to see if there are network rules to harvest - calls the check_tick plugin hook - finally, it waits for any remaining unharvested network rules (if it hasn't shortcircuited) - calls the check_post_dnsbl plugin hook - auto-learns from the message, if applicable - calls the check_post_learn plugin hook - and returns In 3.2.x and 3.3.0 this is all in the Check plugin, in the check_main() method, so can be redefined or overridden with alternative orderings quite easily. --j. Loren Wilton writes: maybe if there was some way to establish a hierachy at startup which groups rule processing into nodes. some nodes finish quickly, some have dependencies, some are negative, etc. Just wanted to point out, this topic came out when site dns cache service started to fail due to excessive dnsbl queries. My slowdown was due to multiple timeouts and/or delay, probably related to answering joe-job rbldns backscatter -- that's the reason I was looking for early exit on scans in process. There is a little of splitting rules into processing speed groups done. Specifically, the net-based tests, being dependent on external events for completion, are split out from the other tests and are processed in two phases. The first phase issues the request for information over the net, and the second phase then waits for an answer. There is a background routine that is harvesting incoming net results while other rules are processed, so when a net result is required it may already be present and no delay will be incurred. This is not an area I understand at all fully, but reading moderately recent comments on Bugzilla leads me to believe that this is an area where some improvement is still possible; there are some net tests that (I think) end up waiting immediately for an answer rather than doing the two-phase processing. How much that slows down the result for the overall email probably depends on many factors. Also note that even issuing the requests and then waiting for the result only when it is needed doesn't guarantee that the mail will not have to wait for results. It could be that one of the very first rules processed (due to priority ort meta dependency, for instance) will need a net result, and so the entire rule process will be forced to wait on it. As far as splitting non-net rules up based on speed, that isn't very practical. Regex rules should in general be quite fast, and all of them are going to require the use of the processor full-time anyway. The speed of the rule will depend on how it is written and the exact content of the email it is processing. So a rule that is dog slow on one email may be blindingly fast on most other emails. I don't know that there is any good way to estimate the speed of a regex simply by looking at it. Loren
Re: Feeding SA-learn
Anthony Peacock escribió: Well the short answer is, yes you can. The slightly longer answer is that you won't get as good results doing this, as the Bayes system uses tokens found in the complete message. By only learning on the body you will not gain any advantage for tokens found in headers. Yep, I know, precisely the problem is that I don't have the original headers after the mail has been delivered. My intention was to manually feed the few spam messages that slip thru undetected. By the time I get a hold of those, they are in the recipient's mail client inbox, not in the server. I was thinking, if I save the mail as EML files, would that preserve the headers in a way that sa-learn can parse correctly? Depends on the client. For instance, Thunderbird stores it's folders in mbox format, so sa-learn can work against those files as-is. Other email clients can save emails in text format complete with headers. I use Thunderbird. There are two files for that folder: Junk.msf (7k) and Junk (53.172k). The msf file must be some kind of index. I just feed the biggest one to sa-learn? /Regards
Re: Feeding SA-learn
Diego Pomatta wrote: Anthony Peacock escribió: Well the short answer is, yes you can. The slightly longer answer is that you won't get as good results doing this, as the Bayes system uses tokens found in the complete message. By only learning on the body you will not gain any advantage for tokens found in headers. Yep, I know, precisely the problem is that I don't have the original headers after the mail has been delivered. My intention was to manually feed the few spam messages that slip thru undetected. By the time I get a hold of those, they are in the recipient's mail client inbox, not in the server. I was thinking, if I save the mail as EML files, would that preserve the headers in a way that sa-learn can parse correctly? Depends on the client. For instance, Thunderbird stores it's folders in mbox format, so sa-learn can work against those files as-is. Other email clients can save emails in text format complete with headers. I use Thunderbird. There are two files for that folder: Junk.msf (7k) and Junk (53.172k). The msf file must be some kind of index. I just feed the biggest one to sa-learn? Yes, the .msf file is an index file. I just copy the mbox file (Junk in your case) to the server and run the following command specifying the filename (as shown): /usr/local/bin/spamassassin --report --mbox Junk -- Anthony Peacock CHIME, Royal Free University College Medical School WWW:http://www.chime.ucl.ac.uk/~rmhiajp/ A CAT scan should take less time than a PET scan. For a CAT scan, they're only looking for one thing, whereas a PET scan could result in a lot of things.- Carl Princi, 2002/07/19
Re: Spamd and MySQL userprefs/ AWL/ Bayes
On Tue, 2008-01-22 at 12:49 -0600, Michael Parker wrote: On Jan 22, 2008, at 12:17 PM, Rubin Bennett wrote: On Tue, 2008-01-22 at 10:45 -0600, Michael Parker wrote: On Jan 22, 2008, at 10:12 AM, Rubin Bennett wrote: WTF am I doing wrong?! Not including debug logs in your message. User prefs does not work with spamassassin, so you won't see anything there, but you should be seeing something for Bayes SQL and AWL SQL if they are configured correctly. What do you mean?! Isn't that what the user_scores_dsn is all about?! The spamassassin script. User prefs only works when you run via spamd. But lets look at the debug output: [31490] dbg: bayes: using username: root [31490] dbg: bayes: database connection established [31490] dbg: bayes: found bayes db version 3 [31490] dbg: bayes: Using userid: 1 Ok, this tells me that Bayes SQL looks to be running just fine. If you read sql/README.bayes it tells you what to look for to test if things are working correctly. [31490] dbg: bayes: corpus size: nspam = 2106, nham = 19051 [31490] dbg: bayes: tok_get_all: token count: 20 [31490] dbg: bayes: score = 0.472224419305046 [31490] dbg: bayes: DB expiry: tokens in DB: 133258, Expiry max size: 15, Oldest atime: 1193647841, Newest atime: 1201025739, Last expire: 1195029791, Current time: 1201025739 It even looks like you've got some data in there. You're right, it does appear to be connecting to the database for bayes. Spamd output below:[EMAIL PROTECTED] ~]# spamd -q -D [12373] dbg: logger: adding facilities: all [12373] dbg: logger: logging level is DBG [12373] dbg: logger: trying to connect to syslog/unix... [12373] dbg: logger: opening syslog with unix socket [12373] dbg: logger: successfully connected to syslog/unix [12373] dbg: logger: successfully added syslog method [12373] dbg: spamd: will perform setuids? 1 [12373] dbg: spamd: creating INET socket: [12373] dbg: spamd: Listen: 128 [12373] dbg: spamd: LocalAddr: 127.0.0.1 [12373] dbg: spamd: LocalPort: 783 [12373] dbg: spamd: Proto: 6 [12373] dbg: spamd: ReuseAddr: 1 [12373] dbg: spamd: Type: 1 [12373] dbg: logger: adding facilities: all [12373] dbg: logger: logging level is DBG [12373] dbg: generic: SpamAssassin version 3.2.3 [12373] dbg: config: score set 0 chosen. [12373] dbg: dns: is Net::DNS::Resolver available? yes [12373] dbg: dns: Net::DNS version: 0.61 [12373] dbg: learn: initializing learner [12373] dbg: config: using /etc/mail/spamassassin for site rules pre files [12373] dbg: config: read file /etc/mail/spamassassin/init.pre [12373] dbg: config: read file /etc/mail/spamassassin/v310.pre [12373] dbg: config: read file /etc/mail/spamassassin/v312.pre [12373] dbg: config: read file /etc/mail/spamassassin/v320.pre [12373] dbg: config: using /var/lib/spamassassin/3.002003 for sys rules pre files [12373] dbg: config: using /var/lib/spamassassin/3.002003 for default rules dir [12373] dbg: config: read file /var/lib/spamassassin/3.002003/updates_spamassassin_org.cf [12373] dbg: config: using /etc/mail/spamassassin for site rules dir [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_adult.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_bayes_poison_nxm.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_evilnum0.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_evilnum1.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_evilnum2.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj_eng.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj_x30.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_header.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_header0.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_header_eng.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_header_x264_x30.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_header_x30.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_highrisk.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_html.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_html_eng.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_html_x30.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_obfu.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_oem.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_random.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_specific.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_spoof.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_stocks.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_unsub.cf [12373] dbg: config: read file /etc/mail/spamassassin/70_sare_uri0.cf [12373] dbg: config: read
Re: Feeding SA-learn
Depends on the client. For instance, Thunderbird stores it's folders in mbox format, so sa-learn can work against those files as-is. Other email clients can save emails in text format complete with headers. I use Thunderbird. There are two files for that folder: Junk.msf (7k) and Junk (53.172k). The msf file must be some kind of index. I just feed the biggest one to sa-learn? Yes, the .msf file is an index file. I just copy the mbox file (Junk in your case) to the server and run the following command specifying the filename (as shown): /usr/local/bin/spamassassin --report --mbox Junk I use Thunderbird as my mail client but have found that I needed to use Evolution to save the messages in mbox format, which was always a hassle. My emails are stored on an IMAP server and what you suggested wasn't working for me. I had the .msf file, but no corresponding mbox file. Because the emails are kept on the IMAP server and are not local, I had to enable the Select this folder for offline use on the Offline tab of the folder properties. I then had the mbox file that I could copy off. -- Mark Johnson http://www.astroshapes.com/information-technology/blog/
Re: Feeding SA-learn
Mark Johnson wrote: Depends on the client. For instance, Thunderbird stores it's folders in mbox format, so sa-learn can work against those files as-is. Other email clients can save emails in text format complete with headers. I use Thunderbird. There are two files for that folder: Junk.msf (7k) and Junk (53.172k). The msf file must be some kind of index. I just feed the biggest one to sa-learn? Yes, the .msf file is an index file. I just copy the mbox file (Junk in your case) to the server and run the following command specifying the filename (as shown): /usr/local/bin/spamassassin --report --mbox Junk I use Thunderbird as my mail client but have found that I needed to use Evolution to save the messages in mbox format, which was always a hassle. My emails are stored on an IMAP server and what you suggested wasn't working for me. I had the .msf file, but no corresponding mbox file. Because the emails are kept on the IMAP server and are not local, I had to enable the Select this folder for offline use on the Offline tab of the folder properties. I then had the mbox file that I could copy off. Good point, I use this on folders that are saved on the local hard disk. -- Anthony Peacock CHIME, Royal Free University College Medical School WWW:http://www.chime.ucl.ac.uk/~rmhiajp/ A CAT scan should take less time than a PET scan. For a CAT scan, they're only looking for one thing, whereas a PET scan could result in a lot of things.- Carl Princi, 2002/07/19
Re: Spamd and MySQL userprefs/ AWL/ Bayes
On Jan 23, 2008, at 6:37 AM, Rubin Bennett wrote: Spamd output below:[EMAIL PROTECTED] ~]# spamd -q -D [12373] dbg: logger: adding facilities: all [12373] dbg: logger: logging level is DBG Can you run this again and this time pass 1-2 msgs through just like you would normally, instead of just the default prime-the-pump message. Also, please remind me of you spamd startup options and maybe even attach your local.cf (or where ever you're adding the sql config items) file for good measure. Michael
Feedback on 3.2.4
Other than the initial reports of performance boost from 3.2.4, I haven't seen much discussion on it as yet. Perhaps it is still too soon to know, but has anyone been seeing other benefits - or identified potential problems? - Skip
RE: more efficent big scoring
Just wanted to point out, this topic came out when site dns cache service started to fail due to excessive dnsbl queries. My slowdown was due to multiple timeouts and/or delay, probably related to answering joe-job rbldns backscatter -- that's the reason I was looking for early exit on scans in process. // George George Georgalis, information system scientist IXOYE George, That is correct! I still maintain that the SA Team is more than bright and talented enough, that over time, they will come up with new algorithms to allow this behavior w/o a substantial SA processing speed decrease. I just cannot imagine that the theoretical and real world limits of these functions have been met yet. And bottom line is, even if a function initially slows down the processing, if the theory behind it is the truth, shouldn't it be pursued until it can be implemented properly? ... or is that just wishful thinking? jdow has suggested some reading to enlighten me that I will be getting to in short order. - rh
RE: whois plugin .. where to get it
On Wed, 23 Jan 2008, ram wrote: Allegedly 100% spam. Innocent until proven guilty, ect. NUCLEAR NAMES, INC. I would love to block all domains with these , but to think of it what is there to prevent them from getting themselves whitelisted by registering good domains There's a lot of difference between not blacklisted and whitelisted. Not using a spam-friendly registrar does not mean they will get a pass, just that they won't get points for haiving used a spam-friendly registrar. They can register one more domain with an innocent website (say a wiki news site) etc Now they are less than 100% spammer registrars Oh, I see what you mean. Adding some score for spam-friendly (not necessarily 100% spammy) is reasonable; using the registrar as a poison pill is not. Plus, what legitimate domain owner would wish to knowingly register their domain with a registrar that has a truly bad reputation? Plus, they can leave that registrar rather easily. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- If Microsoft made hammers, everyone would whine about how poorly screws were designed and about how they are hard to hammer in, and wonder why it takes so long to paint a wall using the hammer. --- 4 days until the 41st anniversary of the loss of Apollo 1
Re: Feedback on 3.2.4
Skip wrote: Other than the initial reports of performance boost from 3.2.4, I haven't seen much discussion on it as yet. Perhaps it is still too soon to know, but has anyone been seeing other benefits - or identified potential problems? No problems with it at all here (around 7 servers upgraded) and the performance is greatly increased. I went from a 1.4 second average scan time to 0.6 seconds average. HTH, Rick
Re: Spamd and MySQL userprefs/ AWL/ Bayes
Here you go, and thanks! Output of spamd -q -D, left running for a while. Well... it reveals that it is in fact pulling my userprefs from SQL, but it was ignoring the ones with the $GLOBAL username. Apparently, it now requires the @GLOBAL username instead (which IIRC it didn't at some point in the past). So... it is working, and I thank you all for your input. Lesson learned: for MySQL configs, use: spamd -q -x -d Make sure your GLOBAL config in mysql is set for the @GLOBAL user. Rubin On Wed, 2008-01-23 at 10:35 -0600, Michael Parker wrote: On Jan 23, 2008, at 6:37 AM, Rubin Bennett wrote: Spamd output below:[EMAIL PROTECTED] ~]# spamd -q -D [12373] dbg: logger: adding facilities: all [12373] dbg: logger: logging level is DBG Can you run this again and this time pass 1-2 msgs through just like you would normally, instead of just the default prime-the-pump message. Also, please remind me of you spamd startup options and maybe even attach your local.cf (or where ever you're adding the sql config items) file for good measure. Michael
Expiry problem
We had a server go crazy last night and reset its date into August of 2277. In any case, we've resolved that, but now I can't get bayes to expire. After the clocks was correctly set, I deleted all tokens that had a lastupdate in the future, and also removed similar bayes_seen rows. I then reset the the token count in bayes_vars to the correct value. When I try to run sa-learn --force-expire, nothing gets expired and the token list keeps growing. Will this get better on its own or do I need to intervene? [14256] dbg: bayes: using username: root [14256] dbg: bayes: database connection established [14256] dbg: bayes: found bayes db version 3 [14256] dbg: bayes: Using userid: 1 [14256] dbg: config: score set 3 chosen. [14256] dbg: learn: initializing learner [14256] dbg: bayes: bayes journal sync starting [14256] dbg: bayes: bayes journal sync completed [14256] dbg: bayes: expiry starting [14256] dbg: bayes: expiry check keep size, 0.75 * max: 112500 [14256] dbg: bayes: token count: 443162, final goal reduction size: 330662 [14256] dbg: bayes: first pass? current: 1201117198, Last: 1201117194, atime: 43200, count: 1231, newdelta: 160, ratio: 268.612510154346, period: 43200 [14256] dbg: bayes: can't use estimation method for expiry, unexpected result, calculating optimal atime delta (first pass) [14256] dbg: bayes: expiry max exponent: 9 [14256] dbg: bayes: atime token reduction [14256] dbg: bayes: === [14256] dbg: bayes: 43200 528 [14256] dbg: bayes: 86400 0 [14256] dbg: bayes: 172800 0 [14256] dbg: bayes: 345600 0 [14256] dbg: bayes: 691200 0 [14256] dbg: bayes: 1382400 0 [14256] dbg: bayes: 2764800 0 [14256] dbg: bayes: 5529600 0 [14256] dbg: bayes: 11059200 0 [14256] dbg: bayes: 22118400 0 [14256] dbg: bayes: couldn't find a good delta atime, need more token difference, skipping expire [14256] dbg: bayes: expiry completed
sa-learn errors.
Hi all. I have been having issues with SA for a while, most of my requests for help going unheard. I've managed to upgrade SA and fix most of the errors, but am still getting a couple that I've not been able to fix yet. Can someone please help with this ? We are running Debian Sarge with SA 3.1.7 from backports. Whenever I try to add new rules with sa-learn from our missed spam folder, I get these errors. And, since I've tried to run them, I also get these errors whenever I manually run spamd or Spamassassin: mailserver:~# sa-learn --spam /home/vpopmail/domains/mail.ourdomain.net/spam/Maildir/.Missed\ Spam.20080124/cur/ Bareword MAX_URI_LENGTH not allowed while strict subs in use at /usr/share/perl5/Mail/SpamAssassin/PerMsgStatus.pm line 2010. Bareword MAX_URI_LENGTH not allowed while strict subs in use at /usr/share/perl5/Mail/SpamAssassin/PerMsgStatus.pm line 2012. Compilation failed in require at /usr/share/perl5/Mail/SpamAssassin.pm line 72. BEGIN failed--compilation aborted at /usr/share/perl5/Mail/SpamAssassin.pm line 72. Compilation failed in require at /usr/bin/sa-learn line 78. BEGIN failed--compilation aborted at /usr/bin/sa-learn line 78. I am used to getting similar sa-learn errors, but not ones that cause problems when spamd or Spamassassin is manually run. Can anyone please define what strict subs is used for and if I should disable it to allow MAX_URI_LENGTH to work properly ? Cheers, Michael Hutchinson http://www.manux.co.nz
Re: sa-learn errors.
On Thu, 24 Jan 2008, Michael Hutchinson wrote: Bareword MAX_URI_LENGTH not allowed while strict subs in use at /usr/share/perl5/Mail/SpamAssassin/PerMsgStatus.pm line 2010. Bareword MAX_URI_LENGTH not allowed while strict subs in use at /usr/share/perl5/Mail/SpamAssassin/PerMsgStatus.pm line 2012. Compilation failed in require at /usr/share/perl5/Mail/SpamAssassin.pm line 72. Those are compile errors in the core SA code. Your install appears to be corrupted. Has anyone been editing the files under /usr/share/perl5/Mail/ ? You will probably need to wipe and reinstall SA from scratch. Note that your local rules and bayes database shouldn't be affected by doing this. I am used to getting similar sa-learn errors, but not ones that cause problems when spamd or Spamassassin is manually run. You may have two different copies of SA installed, and one is bad. This can happen if you install SA from a distro package and then later attempt to install or upgrade from CPAN (or vice versa). Can anyone please define what strict subs is used for and if I should disable it to allow MAX_URI_LENGTH to work properly ? Those are Perl language options; you shouldn't be fiddling around with that stuff unless you're an SA developer or you want to modify SA itself (as opposed to just creating rules or doing other common administrative tasks). -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- We are hell-bent and determined to allocate the talent, the resources, the money, the innovation to absolutely become a powerhouse in the ad business. -- Microsoft CEO Steve Ballmer ...because allocating talent to securing Windows isn't profitable? --- 4 days until Wolfgang Amadeus Mozart's 252nd Birthday
RE: sa-learn errors.
John wrote: Those are compile errors in the core SA code. Your install appears to be corrupted. Has anyone been editing the files under /usr/share/perl5/Mail/ ? You will probably need to wipe and reinstall SA from scratch. Note that your local rules and bayes database shouldn't be affected by doing this. Thanks for the reply, John. I believe there is a problem with more than one perl5/Mail dir hanging around on the system, which I will be addressing shortly - not that this should be an issue, SA is configured to read one config directory, not two.. You may have two different copies of SA installed, and one is bad. This can happen if you install SA from a distro package and then later attempt to install or upgrade from CPAN (or vice versa). I'd say you've hit the nail on the head. I recently did an upgrade from dpkg -i at the previous admin's recommendation. Methinks the method used to install the original SA was different.. probably compiled, though, instead of CPAN. Those are Perl language options; you shouldn't be fiddling around with that stuff unless you're an SA developer or you want to modify SA itself (as opposed to just creating rules or doing other common administrative tasks). Ok. That I can understand. Thank-you very much for your response, John. I now have a plan for the weekend to knock out SA completely and re-install it after backup of Config and Bayes data. Thanks again! Cheers, Michael.
Re: Feedback on 3.2.4
Rick Macdougall wrote: Skip wrote: Other than the initial reports of performance boost from 3.2.4, I haven't seen much discussion on it as yet. Perhaps it is still too soon to know, but has anyone been seeing other benefits - or identified potential problems? No problems with it at all here (around 7 servers upgraded) and the performance is greatly increased. I went from a 1.4 second average scan time to 0.6 seconds average. HTH, Rick Is this without network tests? Because on my server I had Begin : 2008-01-01 End : 2008-01-15 Summary : 3.1.8 Cnt%% Average MinMax -- -- -- -- -- 18968 46.2% 7.837 1.861 10.000 16640 40.6% 13.654 10.001 19.999 2916 7.1% 23.892 20.003 30.000 1379 3.4% 38.132 30.002 59.882 184 0.4% 74.994 60.041 89.753 37 0.1% 99.552 90.282118.884 904 2.2%154.578120.272364.923 Begin : 2008-01-21 End : 2008-01-24 Summary : version 3.2.4 Cnt%% Average MinMax -- -- -- -- -- 5302 44.9% 7.431 3.872 10.000 4737 40.1% 13.643 10.002 19.998 869 7.4% 24.003 20.008 29.982 555 4.7% 41.017 30.001 59.947 126 1.1% 72.529 60.201 89.941 24 0.2%101.170 90.641118.022 201 1.7%154.700120.454188.119 Because by just the percentages scantime is roughly the same with exactly the same hardware. -- Jorge Valdes
Re: Feedback on 3.2.4
Jorge Valdes wrote: No problems with it at all here (around 7 servers upgraded) and the performance is greatly increased. I went from a 1.4 second average scan time to 0.6 seconds average. HTH, Rick Is this without network tests? Because on my server I had Begin : 2008-01-01 End : 2008-01-15 Summary : 3.1.8 Cnt%% Average MinMax -- -- -- -- -- 18968 46.2% 7.837 1.861 10.000 16640 40.6% 13.654 10.001 19.999 2916 7.1% 23.892 20.003 30.000 1379 3.4% 38.132 30.002 59.882 184 0.4% 74.994 60.041 89.753 37 0.1% 99.552 90.282118.884 904 2.2%154.578120.272364.923 Begin : 2008-01-21 End : 2008-01-24 Summary : version 3.2.4 Cnt%% Average MinMax -- -- -- -- -- 5302 44.9% 7.431 3.872 10.000 4737 40.1% 13.643 10.002 19.998 869 7.4% 24.003 20.008 29.982 555 4.7% 41.017 30.001 59.947 126 1.1% 72.529 60.201 89.941 24 0.2%101.170 90.641118.022 201 1.7%154.700120.454188.119 Because by just the percentages scantime is roughly the same with exactly the same hardware. Yup, full scanning including network tests and bayes stored in a network MySQL server. Hardware is Dell 860s (I believe, could be 850) with 4 gigs of ram and no second CPU installed. Regards, Rick
Re: Feeding SA-learn
On 2008-01-23, Anthony Peacock [EMAIL PROTECTED] wrote: My intention was to manually feed the few spam messages that slip thru undetected. By the time I get a hold of those, they are in the recipient's mail client inbox, not in the server. I was thinking, if I save the mail as EML files, would that preserve the headers in a way that sa-learn can parse correctly? Depends on the client. For instance, Thunderbird stores it's folders in mbox format, so sa-learn can work against those files as-is. Other email clients can save emails in text format complete with headers. The biggest problem with this is training the users to do that consistantly. Isn't that what cron is for? :-) I have a cron job on my imap server to regularly feed ham and spam through sa-learn. -- John ([EMAIL PROTECTED])
RE: whois plugin .. where to get it
-Original Message- From: ram [mailto:[EMAIL PROTECTED] Sent: Monday, January 21, 2008 2:36 PM ...omissis... Again, no registrar check, sorry. You could eventually use the: uri_whois nsname or the uri_whois nsaddr tests to attempt catch these. I think I am missing something here. The NS address is different from the registrar. Right, it is. The URIWhois does not detect the registrar. It detects the name and the address of the DNS- and whois-defined NSes for that domain. How can we score based on NS address? Can a spammer not put innocent servers as his Nameserver , as long as they allow DNS queries to his host I guess there is a not in excess in your question. If you are asking can a spammer put an innocent etc, etc, then the answer is: yes, he/she can, but this wouldn't help with the URIWhois plugin. What I meant in my scarce previous reply was that the URIWhois plugin builds a list of the nameservers' names and addresses related to the uris SA finds in the message. These names and addresses are built merging the ones discovered through DNS queries with the ones discovered through whois queries. Then, your rules may attempt to match DNS server names or addresses from this list. You of course would attempt matching the list with names and addresses which are well-known to be bad, not with any innocent one. There are many usage cases for the URIWhois plugin. However, basically spammers as well as everybody else often like to have full control over the authoritative NSes of their domains. Thereby, you may easily protected yourself even from future spam by creating a URIWhois rule matching the bunch of addresses in which the whois-published nameservers are: it is often assigned to a single entity which I bet is less than innocent. In example, whois says beekeenidotcom is handled by 210.14.128.172 and 210.14.128.112. These NSes are the 210.14.128.0/13 address space which is assigned to a Chinese company. When I attempt to access its site I even get a warning from my antivirus. Ok, I decide they are basically spammers. Then I put this rule in my URIWhois.cf: uri_whois SPAMDNSADDR nsaddr in 210.14.128.172/13 score SCAMDNSADDR {TheScoreILike} and that's it: sites whose DNS authoritative servers are in these addresses get a score, regardless of the NS name and any attempt to DNS redirection by them... Of course, I can also group together more bunch of addresses: uri_whois SPAMDNSADDR nsaddr in 210.14.128.172/13 1.1.1.0/24 2.2.2.0/24 etc, etc Thereby, I don't need a SA rule for every and each bunch as long as I want to score them the same. Anyway, apart from matching the DNS names and addresses published through whois records, I recently discovered that most spam domains don't respond to SOA and NS request!!! This is quite easy to detect with very-low cpu consumption (i.e., asynchronously), at the cost that the spam has to wait for at least a couple of DNS request timeouts before asserting that SOA and/or NS replies are missing. By the way, RFC 1035 states that SOA and NS requests MUST be replied by an authoritative nameserver. Try this: dig soa beekeenidotcom ... Unfortunately this detection is not yet implemented in the URIWhois RFC1035IGN rule. The format of the registrar in whois information is not standardized. I wonder why. If I could do something like dig domain.tld REG ( just like dig domain.tld MX ) then life would have been so simple. The main problem here is that, when ICANN delegates control about gTLD zones, there is not even a word in its agreements regarding public access to registration records. gTLDs' Network Information Centers are basically free to do whatever they like with their gTLD, provided they implement something to let ICANN inspect their records. ICANN, not you or me... Thanks Ram You welcome, Giampaolo
Re: Feeding SA-learn
On 2008-01-23, Diego Pomatta [EMAIL PROTECTED] wrote: I use Thunderbird. There are two files for that folder: Junk.msf (7k) and Junk (53.172k). The msf file must be some kind of index. I just feed the biggest one to sa-learn? Yup. Use sa-learn --spam --mbox Junk to learn your spam. You'll want to use the --mbox switch so sa-learn will process it as an mbox format mailbox, since that's what Thunderbird uses to store mail. -- John ([EMAIL PROTECTED])
Re: Feeding SA-learn
On 2008-01-23, Mark Johnson [EMAIL PROTECTED] wrote: My emails are stored on an IMAP server and what you suggested wasn't I use Thunderbird as my mail client but have found that I needed to use Evolution to save the messages in mbox format, which was always a hassle. mbox is already the format in which Thunderbird stores mail. What was the problem that caused you to use Evolution? Because the emails are kept on the IMAP server and are not local, I had to enable the Select this folder for offline use on the Offline tab of the folder properties. I then had the mbox file that I could copy off. If you have shell access to the machine running the imap server, use a cron job on the server to feed your Junk into spamassassin. -- John ([EMAIL PROTECTED])
Re: Feeding SA-learn
John Thompson wrote: Isn't that what cron is for? :-) I have a cron job on my imap server to regularly feed ham and spam through sa-learn. Do you delete the messages from the IMAP folder after you learn them? If so, how do you go about that? I'm pretty sure if I deleted the mail files from the command line, I have to run a reconstruct on the mailbox or the folder throws errors on the client. This is on a Cyrus IMAP server. Thanks! -- Mark Johnson http://www.astroshapes.com/information-technology/blog/
Re: Expiry problem
Steven Stern wrote: We had a server go crazy last night and reset its date into August of 2277. In any case, we've resolved that, but now I can't get bayes to expire. After the clocks was correctly set, I deleted all tokens that had a lastupdate in the future, and also removed similar bayes_seen rows. I then reset the the token count in bayes_vars to the correct value. d When I try to run sa-learn --force-expire, nothing gets expired and the token list keeps growing. Will this get better on its own or do I need to intervene? You might need to ditch your bayes database. The database will, over time, partially fix itself, but right now any one off tokens learned while the date was off are stuck in your bayes DB until 2277. SA's expiry method is based on the age of a token, based on when it was last accessed. That method has absolutely no way to deal with atimes that are in the future, so it will never try to expire those tokens. It can partially fix itself, because every time a token gets accessed, its atime gets updated. So as the more common tokens get used, they'll start rotating out as they would normally. However, any unique tokens are stuck there. If you're *really* desperate to preserve the bayes DB, you could wait a couple days, do a sa-learn --backup, use grep to remove all the lines with absurd atimes, then use sa-learn --restore. That's a good bit of work to go through... If you decide to go this route: For reference, and assuming my scratchpad math is right, the atimes for 2277 should be around 9.6 billion, while the ones for 2008 should be around 1.2 billion. Of course, that's assuming the atimes are stored 64 bit and aren't wrapping as 32 bit numbers.. However, if that were the case, they'd be wrapping to 2004, and your expire numbers should show really high token eliminations, not really low..
Re: Expiry problem
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/23/2008 07:35 PM, Matt Kettler wrote: | Steven Stern wrote: | We had a server go crazy last night and reset its date into August of | 2277. In any case, we've resolved that, but now I can't get bayes to | expire. | | After the clocks was correctly set, I deleted all tokens that had a | lastupdate in the future, and also removed similar bayes_seen rows. I | then reset the the token count in bayes_vars to the correct value. | d | When I try to run sa-learn --force-expire, nothing gets expired and | the token list keeps growing. Will this get better on its own or do I | need to intervene? | You might need to ditch your bayes database. | | The database will, over time, partially fix itself, but right now any | one off tokens learned while the date was off are stuck in your bayes | DB until 2277. SA's expiry method is based on the age of a token, | based on when it was last accessed. That method has absolutely no way to | deal with atimes that are in the future, so it will never try to expire | those tokens. | | It can partially fix itself, because every time a token gets accessed, | its atime gets updated. So as the more common tokens get used, they'll | start rotating out as they would normally. However, any unique tokens | are stuck there. | | If you're *really* desperate to preserve the bayes DB, you could wait a | couple days, do a sa-learn --backup, use grep to remove all the lines | with absurd atimes, then use sa-learn --restore. That's a good bit of | work to go through... | | If you decide to go this route: For reference, and assuming my | scratchpad math is right, the atimes for 2277 should be around 9.6 | billion, while the ones for 2008 should be around 1.2 billion. Of | course, that's assuming the atimes are stored 64 bit and aren't wrapping | as 32 bit numbers.. However, if that were the case, they'd be wrapping | to 2004, and your expire numbers should show really high token | eliminations, not really low.. | It's finally started to remove tokens, so I think I'm OK. We use SQL bayes, so it was an easy matter to use ~ delete from bayes_token where atime UNIX_TIMESTAMP(); to clean up the stuff from the future. - -- ~ Steve -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFHmAwSeERILVgMyvARAmkBAJ4od1lX/wXYdadek1deySDYZi4SQgCfcskW dOHVuSkn5UeKZUGYJjA6J2A= =c5W9 -END PGP SIGNATURE-
Re: whois plugin .. where to get it
Giampaolo Tomassoni wrote: Right, it is. The URIWhois does not detect the registrar. It detects the name and the address of the DNS- and whois-defined NSes for that domain. So how is this substantially different from the URIDNSBL plugin that comes with SA? Bear in mind this plugin *DOES* resolve the NSes for the domain, and DOES check those too. Take for example URIBL_SBL, which only makes sense in the context of the IP of the nameservers (since it's an IP based RBL). I guess you could say that looking up the IP of the host in the URL would also work, but that's an invitation for DoS, so it's not something URIDNSBL does. The only big difference I see at face value is it uses whois instead of DNS to find the NS records.. that hardly seems efficient..
Re: whois plugin .. where to get it
Quoting Matt Kettler [EMAIL PROTECTED]: The only big difference I see at face value is it uses whois instead of DNS to find the NS records.. that hardly seems efficient.. Whois is definitely the wrong protocol to use for automated testing, especially for any high volumes. It was not designed or intended for that purpose, which is arguably abusive. Jeff C.