Re: getting Bayes token data from spamassassin
Michael Parker writes: Stuart Robinson wrote: Hello, all. On Mon, Jan 15, 2007 at 01:54:07AM -0800, Stuart Robinson wrote: I've searched around a bit, both on gmane and Google, but I haven't found much more information regarding your two points. What IS stored in the token field of the table bayes_token? And how is the SHA1 hash involved? A SHA1 hash is taken of the original token value, and the bottom 40 bits are used as the token from then-on. There is a plugin call which can be used to store raw token - hash value data, but otherwise the raw token information is lost after the message is processed. Where could I find more information about the plugin call that allows me to do this? perldoc Mail::SpamAssassin::Plugin In particular: http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Plugin.html#item_bayes_scan http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Plugin.html#item_bayes_learn You should also search the dev list from a couple of years ago at least. Lots of discussion about the change and why it was done including, if memory serves me correctly, a proof of concept plugin to save off the token values. by the way, a nice, working plugin that does this would be quite useful on the CustomPlugins wiki page, or contributed as an optional plugin... --j.
restart sequence after changing configuration
As I understand the situation when using amavis/spamassassin/postfix the flow of a messages is that it is received by postfix, passed to amavisd, from there to clamav, then to spamassassin and then back to postfix. My questions are: 1. is this correct? 2. if I change spamassasin's local.cf or user_prefs, what needs to be restarted and in what sequence/ 3. if I change amavisd.conf what needs restarting/reloading, etc and in what sequence? Thanks, mike
Re: restart sequence after changing configuration
Mike Kenny wrote: As I understand the situation when using amavis/spamassassin/postfix the flow of a messages is that it is received by postfix, passed to amavisd, from there to clamav, then to spamassassin and then back to postfix. My questions are: 1. is this correct? 2. if I change spamassasin's local.cf http://local.cf or user_prefs, what needs to be restarted and in what sequence/ 3. if I change amavisd.conf what needs restarting/reloading, etc and in what sequence? amavisd-new loads the spamassassin perl modules itself, if you are running amavisd-new you don't need to run spamd. Restarting amavisd-new will cause local.cf to be re-read, however much of user_prefs is not used by amavis, so you should look at the settings in amavisd.conf first. You can restart amavisd-new without restarting postfix. -- Robert Brooks, Network Manager, Cable Wireless UK [EMAIL PROTECTED] http://wtg.cw.com/ Tel: +44 (0)20 7339 8600 Fax: +44 (0)20 7339 8601 - What was your username again? - BOFH-
spamc -d failover question
If you use spamc -d host1,host2,host3 and suppose that it connects to host1 but you have hit the max number of connections, will it fail over to hots2 or will it just fail? Thanks in advance.
Re: sa-update failing - I think dev just went live again?
Jason Haar writes: Similar to the issue found Jan 1 2007, I am currently seeing # sa-update error: GPG validation failed! The update downloaded successfully, but it was not signed with a trusted GPG key. Instead, it was signed with the following keys: 24F434CE Perhaps you need to import the channel's GPG key? There is no 24F434CE key that I can find, and the thread titled SA-UPDATE and recent branches/3.1 rules? seems to imply this is a fault that can happen with some process on updates.spamassassin.org? 24F434CE is the active subkey of 5244EC45: : jm 899...; gpg -v 494178.tar.gz.asc gpg: armor header: Version: GnuPG v1.4.2 (SunOS) gpg: assuming signed data in `494178.tar.gz' gpg: Signature made Mon Jan 8 19:47:19 2007 GMT using RSA key ID 24F434CE gpg: using subkey 24F434CE instead of primary key 5244EC45 gpg: using classic trust model gpg: Good signature from updates.spamassassin.org Signing Key [EMAIL PROTECTED] gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: 5E54 1DC9 59CB 8BAC 7C78 DFDC 4056 A61A 5244 EC45 Subkey fingerprint: 0C2B 1D71 75B8 52C6 4B3C DC71 6C55 3978 24F4 34CE gpg: binary signature, digest algorithm SHA1 sounds like your sa-update key info got lost somehow? --j.
Re: getting Bayes token data from spamassassin
On Tue, Jan 16, 2007 at 10:21:14AM +, Justin Mason wrote: by the way, a nice, working plugin that does this would be quite useful on the CustomPlugins wiki page, or contributed as an optional plugin... The plugin itself is pretty trivial -- the question is: what to do with the token information? Should it be sent out to a flat file, kept in a DBM, etc? That's where the non-trivial stuff happens. -- Randomly Selected Tagline: ... then you'll excuse me, but I'm in the middle of fifteen things, all of them annoying. - Ivonova, Babylon 5 (Midnight on the Firing Line) pgp47jf7m7BT4.pgp Description: PGP signature
RE: Checking for PTR record
I use this header rule for the same thing: header LOCAL_INVALID_PTR2 Received =~ /from \S+ \(unknown / score LOCAL_INVALID_PTR2 0.8 describe LOCAL_INVALID_PTR2 Header contains no PTR2 Robert Peace he would say instead of goodbyepeace my brother. -Original Message- From: Peter Smith [mailto:[EMAIL PROTECTED] Sent: Saturday, January 13, 2007 6:05 PM To: users@spamassassin.apache.org Subject: Checking for PTR record Hi, I'm interested in filtering mail relayed from hosts which have no reverse dns (PTR) record. A lot of MTAs support rejection of mail from such hosts, but I feel this will reject too much genuine mail, so I'm looking to approach the problem via Spam Assassin - perhaps score 1 or 2 for such mail. I was suprised that Spam Assassin doesn't already have a rule for this, or that a plugin has not been written; or perhaps I'm looking in the wrong place? I'm aware of the tests Spam Assassin performs for PRT records for hotmail, excite, mail.com etc, but there don't seem to be any general rules. Thanks, Pete
Re: A bit OT Question
Chris wrote: On Monday 15 January 2007 7:07 pm, John D. Hardin wrote: On Mon, 15 Jan 2007, Chris wrote: I keep seeing the below bounces in spam reports I'm sending out. I know the ordb is dead, who is using it, Earthlink or corp.mailsecurity.net.au? [EMAIL PROTECTED] SMTP error from remote mailer after RCPT TO:[EMAIL PROTECTED]: host corp.mailsecurity.net.au [67.18.110.234]: 451 4.7.1 Temporary lookup failure of 209.86.89.61 at relays.ordb.org: retry timeout exceeded corp.mailsecurity.net.au is using it. They should notice it pretty quickly now that *all* of their inbound mail is being TMPFAILed... :) spam is way down this week. Did you do something to the server? Thanks John, I shot an email off to their tech contact letting them know about relays.ordb.org being shut down, hopefully it will be fixed. You sent them an email to tell them their email is down? Good luck with that :-)
Re: sa-update failing - I think dev just went live again?
Justin Mason wrote: 24F434CE is the active subkey of 5244EC45: ... sounds like your sa-update key info got lost somehow? Yeah. This is a CentOS-4 server I installed yesterday. Looks like something went wrong with it. The error message says: Perhaps you need to import the channel's GPG key? For example: wget http://spamassassin.apache.org/updates/GPG.KEY gpg --import GPG.KEY That doesn't fix the problem. However, replacing the last line with: sa-update --import GPG.KEY does fix it. Perhaps that should be changed? Anyway, all better now - thanks! -- Cheers Jason Haar Information Security Manager, Trimble Navigation Ltd. Phone: +64 3 9635 377 Fax: +64 3 9635 417 PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1
ClamAV plugin with 3.1.17
I just upgraded to 3.1.7 and I just wanted to make sure that clamav is still run via the pm file and not some how enabled via a .pre file. Thanks -Brent
RE: ClamAV plugin with 3.1.17
Woops I meant .cf file not pm file. I just wanted to make sure this is still accurate: http://wiki.apache.org/spamassassin/ClamAVPlugin -Original Message- From: Brent Kennedy [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 16, 2007 3:18 PM To: users@spamassassin.apache.org Subject: ClamAV plugin with 3.1.17 I just upgraded to 3.1.7 and I just wanted to make sure that clamav is still run via the pm file and not some how enabled via a .pre file. Thanks -Brent
Re: sa-update failing - I think dev just went live again?
Jason Haar wrote: Justin Mason wrote: 24F434CE is the active subkey of 5244EC45: ... sounds like your sa-update key info got lost somehow? Yeah. This is a CentOS-4 server I installed yesterday. Looks like something went wrong with it. The error message says: Perhaps you need to import the channel's GPG key? For example: wget http://spamassassin.apache.org/updates/GPG.KEY gpg --import GPG.KEY That doesn't fix the problem. However, replacing the last line with: sa-update --import GPG.KEY does fix it. Perhaps that should be changed? It was fixed a while ago, there just hasn't been a release since it was fixed. Daryl
Re: getting Bayes token data from spamassassin
On Tue, Jan 16, 2007 at 10:21:14AM +, Justin Mason wrote: by the way, a nice, working plugin that does this would be quite useful on the CustomPlugins wiki page, or contributed as an optional plugin... The plugin itself is pretty trivial -- the question is: what to do with the token information? Should it be sent out to a flat file, kept in a DBM, etc? That's where the non-trivial stuff happens. Couldn't the raw tokens just be kept in the same database by adding an additional column to the table bayes_token that isn't indexed? That wouldn't affect performance too much, would it? ++ | Stuart Robinson| | Email: stuart at zapata dot org| | Homepage: http://www.zapata.org/stuart | ++
Re: getting Bayes token data from spamassassin
On Tue, Jan 16, 2007 at 02:02:01PM -0800, Stuart Robinson wrote: Couldn't the raw tokens just be kept in the same database by adding an additional column to the table bayes_token that isn't indexed? That wouldn't affect performance too much, would it? Besides requiring a new data layout for the DBM files, it would be space-wise inefficient. This issue was discussed to death a few years ago. If you want this kind of thing on your specific server, the plugin call allows you to write whatever you want to deal with it. I'd probably keep a new table w/ hash-raw token mappings, or some kind of DBM. -- Randomly Selected Tagline: Do not meddle in the affairs of wizards, for you are crunchy and good with ketchup. - Unknown pgpjPNMjnp5oV.pgp Description: PGP signature
Re: getting Bayes token data from spamassassin
Thanks. Once I have this all figured out, I will write up something and put it on my homepage and post a link to it here. A SHA1 hash is taken of the original token value, and the bottom 40 bits are used as the token from then-on. There is a plugin call which can be used to store raw token - hash value data, but otherwise the raw token information is lost after the message is processed. Where could I find more information about the plugin call that allows me to do this? perldoc Mail::SpamAssassin::Plugin In particular: http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Plugin.html#item_bayes_scan http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Plugin.html#item_bayes_learn You should also search the dev list from a couple of years ago at least. Lots of discussion about the change and why it was done including, if memory serves me correctly, a proof of concept plugin to save off the token values. by the way, a nice, working plugin that does this would be quite useful on the CustomPlugins wiki page, or contributed as an optional plugin... ++ | Stuart Robinson| | Email: stuart at zapata dot org| | Homepage: http://www.zapata.org/stuart | ++
looking for repository of spam email content
Can anyone recommend a site that provides examples of content from spam email? I am a researcher looking for content from the 50 most recent or most prevalent spams. Is a site like http://spamosphere.wordpress.com/ the best source for this? Thanks! Andrea
Re: bayes poisoning
maillist wrote: I see a few emails every-now-and-then about bayes poisoning, and am wondering what is means. From what I understand, it is some message that gets learned (only through autolearn?) that has certain characteristics that throw the bayes system off. From what I've seen there are generally two ways it is referred to: 1. random text or phrases thrown into spam to make it look like spam and ham look more alike: This is an imagined problem. 2. spam incorrectly leanred as ham or ham incorrectly learned as spam: Enough of these (either from manual or auto learning) and your Bayes database will be useless. -- Chris
RE: looking for repository of spam email content
Don't worry, you just posted to a mailing list that is mirrored everywhere. You will get your spam shortly. ;-) - This email has been scanned and certified safe by SpammerTrap(tm) For Information please see http://www.spammertrap.com
This score makes no sense
I just got this in my inbox. If clamav scores 10, how can this be marked as not spam? X-Spam-Virus: Yes (Html.Phishing.Fake.Sanesecurity.06080201) X-Spam-Seen: Tokens 75 X-Spam-New: Tokens 126 X-Spam-ASN: AS14492 66.70.0.0/17 X-Spam-Remote: Host localhost.localdomain X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on cpollock.localdomain X-Spam-Hammy: Tokens 8 X-Spam-Status: No, score=4.5 required=5.0 tests=BAYES_05,CLAMAV, HTML_IMAGE_ONLY_16,HTML_MESSAGE,MIME_HTML_ONLY,REPLY_TO_EMPTY autolearn=disabled version=3.1.7 X-Spam-Spammy: Tokens 1 X-Spam-Pyzor: Reported 0 times. X-Spam-DCC: sonic.net cpollock 1156; Body=65 Fuz1=65 Fuz2=216 X-Spam-Untrusted: Relays [ ip=66.70.114.30 rdns=postmaster.costapacific.net helo=postmaster.costapacific.net by=mx-nebolish.atl.sa.earthlink.net ident= envfrom= intl=0 id=1h6ZgC39e3Nl3490 auth= ] X-Spam-Level: X-Spam-RBL: Results dns:www.costapacific.net [66.70.114.35] Status: U Return-Path: [EMAIL PROTECTED] Received: from pop.earthlink.net [209.86.93.201] by localhost with POP3 (fetchmail-6.2.5) for [EMAIL PROTECTED] (single-drop); Tue, 16 Jan 2007 19:07:36 -0600 (CST) Received: from postmaster.costapacific.net ([66.70.114.30]) by mx-nebolish.atl.sa.earthlink.net (EarthLink SMTP Server) with SMTP id 1h6ZgC39e3Nl3490 for [EMAIL PROTECTED]; Tue, 16 Jan 2007 20:07:06 -0500 (EST) Received: (qmail 6432 invoked by uid 80); 17 Jan 2007 00:56:51 - Date: 17 Jan 2007 00:56:51 - Message-ID: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Your TD Canada EasyWeb Online Account is about to expire! From: TD Canada [EMAIL PROTECTED] Reply-To: MIME-Version: 1.0 Content-Type: text/html Content-Transfer-Encoding: 8bit X-ELNK-Info: sbv=0; sbrc=.0; sbf=00; sbw=000; X-SenderIP: 66.70.114.30 X-ASN: ASN-14492 X-CIDR: 66.70.0.0/17 X-UID: 25573 X-Length: 3148 -- Chris KeyID 0xE372A7DA98E6705C http://learn.to/quote pgputze5RaaE3.pgp Description: PGP signature
Which is more efficient: two regexp's or one regexp with alternation?
If I want to block subjects matching foo or bar, is it more efficient to write two regexps or a single foo|bar regexp? I'd think a single regexp is more efficient, but SpamAssassin ships w/ rule-sets that have multiple rules. Given how many spams people get, even a small improvement in efficiency would help? Or are there multiple rules because the scores are different and have to be optimized differently? -- We're just a Bunch Of Regular Guys, a collective group that's trying to understand and assimilate technology. We feel that resistance to new ideas and technology is unwise and ultimately futile.
Re: This score makes no sense
On Tue, Jan 16, 2007 at 07:14:05PM -0600, Chris wrote: I just got this in my inbox. If clamav scores 10, how can this be marked as not spam? X-Spam-Status: No, score=4.5 required=5.0 tests=BAYES_05,CLAMAV, HTML_IMAGE_ONLY_16,HTML_MESSAGE,MIME_HTML_ONLY,REPLY_TO_EMPTY autolearn=disabled version=3.1.7 My guess is that CLAMAV doesn't score 10. Doing a quick calculation, it seems to score ~4.5. -- Randomly Selected Tagline: Programming is like sex. One mistake and you have to support it for the rest of your life. - Michael Sinz pgpDFbTY3tJsV.pgp Description: PGP signature
Re: Which is more efficient: two regexp's or one regexp with alternation?
On Tue, Jan 16, 2007 at 06:23:48PM -0700, Kelly Jones wrote: If I want to block subjects matching foo or bar, is it more efficient to write two regexps or a single foo|bar regexp? It'll be more efficient to do a single regexp. Or are there multiple rules because the scores are different and have to be optimized differently? Yes. Rules have different hit rates, so they need to get different scores because it's better for overall efficacy. -- Randomly Selected Tagline: Illiterate? Write to us for a free brochure. pgpsojGLfbjEm.pgp Description: PGP signature
Re: This score makes no sense
On Tuesday 16 January 2007 8:29 pm, Theo Van Dinter wrote: On Tue, Jan 16, 2007 at 07:14:05PM -0600, Chris wrote: I just got this in my inbox. If clamav scores 10, how can this be marked as not spam? X-Spam-Status: No, score=4.5 required=5.0 tests=BAYES_05,CLAMAV, HTML_IMAGE_ONLY_16,HTML_MESSAGE,MIME_HTML_ONLY,REPLY_TO_EMPTY autolearn=disabled version=3.1.7 My guess is that CLAMAV doesn't score 10. Doing a quick calculation, it seems to score ~4.5. Odd, because if I run it through sa-learn --spam, spamassassin -r and spamassasin -t it comes out as: Content analysis details: (19.3 points, 5.0 required) pts rule name description -- -- 0.6 REPLY_TO_EMPTY Reply-To: is empty 0.0 HTML_MESSAGE BODY: HTML included in message 5.0 BAYES_99 BODY: Bayesian spam probability is 99 to 100% [score: 1.] 0.0 MIME_HTML_ONLY BODY: Message only has text/html MIME parts 0.5 HTML_IMAGE_ONLY_16 BODY: HTML: images with 1200-1600 bytes of words 2.2 DCC_CHECK Listed in DCC (http://rhyolite.com/anti-spam/dcc/) 10 CLAMAV Clam AntiVirus detected a virus 1.0 SAGREY Adds 1.0 to spam from first-time senders Guess its just another mystery of life as to why it didn't get the 10points the first time for the clamav hit. -- Chris KeyID 0xE372A7DA98E6705C http://learn.to/quote pgp4WeCifojxp.pgp Description: PGP signature
Re: A bit OT Question
On Tuesday 16 January 2007 8:57 am, Dave Williss wrote: corp.mailsecurity.net.au is using it. They should notice it pretty quickly now that *all* of their inbound mail is being TMPFAILed... :) spam is way down this week. Did you do something to the server? Thanks John, I shot an email off to their tech contact letting them know about relays.ordb.org being shut down, hopefully it will be fixed. You sent them an email to tell them their email is down? Good luck with that :-) Yep, and they not only replied, but the problem is fixed. I sent the link to the slashdot article about relays.ordb.org going down. -- Chris KeyID 0xE372A7DA98E6705C http://learn.to/quote pgpfTj9LrJfSq.pgp Description: PGP signature