Re: Reporting scams to fraudwatchinternational
On Sunday 01 May 2005 07:46 am, Chris wrote: Nope, I finally managed to get an email off to the tech- support contact to whom the domain is registered to, I'll have to see what happens from there. Chris Found out the entire fraudwatchinternational site had been down for over 32hrs. It appears to be mostly back up now, I've just forwarded a paypal phish to them and will see what happens. -- Chris Registered Linux User 283774 http://counter.li.org 20:07:30 up 4 days, 14:09, 1 user, load average: 0.28, 0.25, 0.19 Mandriva Linux 10.1 Official, kernel 2.6.8.1-12mdk If you can lead it to water and force it to drink, it isn't a horse.
Re: Folder redirection
Apologies if I missed this rather simple question in the FAQ's, but I really did look. Good reason for you to miss it, it isn't there. SA doesn't route messages, it only filters them. Something else looks at the filtering and decides what to do with the message. As someone else suggested this may be procmail on your system. Depending on exactly what all you are using for mail, it could be also be one of several other things, too. You may have to look around in your mail system to find where the routing is really occuring. Loren
Re: Folder redirection
Loren, Greg, Thanks. The problem was that I had set up a single-user config prior to the system-wide config. Greg, my system-wide config looks almost exactly as you described. Loren, your comment about procmail doing the move reminded me that the single-user set up included a .procmailrc in $HOME. That file, of course, had the redirection in it. I'm set. Thanks again, Mark Loren Wilton wrote: Apologies if I missed this rather simple question in the FAQ's, but I really did look. Good reason for you to miss it, it isn't there. SA doesn't route messages, it only filters them. Something else looks at the filtering and decides what to do with the message. As someone else suggested this may be procmail on your system. Depending on exactly what all you are using for mail, it could be also be one of several other things, too. You may have to look around in your mail system to find where the routing is really occuring. Loren -- Mark Harwood www.MarkHarwood.com
Re: The highest score?
I cheat. I have a couple personal rules guaranteed to hit spam and no ham whatsoever. They hit 100. MOM Agent is guaranteed spam. It seems to hit 200. So it's not fair. I have, however, seen over 100 with pure SARE rule sets so many of them were hit. {^_-} - Original Message - From: Roman Serbski [EMAIL PROTECTED] Hi all, What was the highest score you've ever seen? I received a message yesterday that was scored with 51.9(!). =) SA in action: ;-) Sat, 30 Apr 2005 19:45:21 KGST:80593: SA: REPORT hits = 51.9/3.5 4.1 MIME_BOUND_DD_DIGITS Spam tool pattern in MIME boundary 1.2 SUBJ_HAS_SPACES Subject contains lots of white space 3.5 HELO_DYNAMIC_IPADDR2 Relay HELO'd using suspicious hostname (IP addr 2) 3.8 MSGID_SPAM_CAPS Spam tool Message-Id: (caps variant) 0.1 RCVD_BY_IP Received by mail server with no name 0.0 FROM_ILLEGAL_CHARS From contains too many raw illegal characters 2.9 SUBJ_ILLEGAL_CHARS Subject contains too many raw illegal characters 2.1 HEAD_ILLEGAL_CHARS Header contains too many raw illegal characters 0.5 HTTP_ESCAPED_HOST URI: Uses %-escapes inside a URL's hostname 0.2 HTTP_EXCESSIVE_ESCAPES URI: Completely unnecessary %-escapes inside a URL 2.0 HTML_TAG_EXIST_MARQUEE BODY: HTML has marquee tag 0.0 HTML_TEXT_AFTER_HTML BODY: HTML contains text after HTML close tag 0.1 HTML_TEXT_AFTER_BODY BODY: HTML contains text after BODY close tag 0.0 HTML_MESSAGE BODY: HTML included in message 0.0 HTML_FONT_FACE_BAD BODY: HTML font face is not a word 0.1 HTML_FONT_BIG BODY: HTML tag for a big font size 0.8 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar to background 0.1 MPART_ALT_DIFF BODY: HTML and text parts are different 0.0 HTML_SHOUTING3 BODY: HTML has very strong shouting markup 0.1 RAZOR2_CF_RANGE_51_100 BODY: Razor2 gives confidence level above 50% [cf: 100] 0.0 HTML_NONELEMENT_00_10 BODY: 0% to 10% of HTML elements are non-standard 1.9 BAYES_99 BODY: Bayesian spam probability is 99 to 100% [score: 1.] 0.2 MIME_HTML_ONLY BODY: Message only has text/html MIME parts 0.5 HTML_EVENT_UNSAFE BODY: HTML contains unsafe auto-executing code 0.0 MIME_QP_LONG_LINE RAW: Quoted-printable line longer than 76 chars 1.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/) 0.0 RCVD_IN_SORBS_HTTP RBL: SORBS: sender is open HTTP proxy server [200.89.154.29 listed in dnsbl.sorbs.net] 0.4 RCVD_IN_NJABL_PROXY RBL: NJABL: sender is an open proxy [200.89.154.29 listed in combined.njabl.org] 3.1 RCVD_IN_XBL RBL: Received via a relay in Spamhaus XBL [200.89.154.29 listed in sbl-xbl.spamhaus.org] 2.0 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address [200.89.154.29 listed in dnsbl.sorbs.net] 3.8 RCVD_IN_DSBL RBL: Received via a relay in list.dsbl.org [http://dsbl.org/listing?200.89.154.29] 0.1 RCVD_IN_NJABL_DUL RBL: NJABL: dialup sender did non-local SMTP [200.89.154.29 listed in combined.njabl.org] 1.0 URIBL_SBL Contains an URL listed in the SBL blocklist [URIs: ourk2.com] 1.5 URIBL_WS_SURBL Contains an URL listed in the WS SURBL blocklist [URIs: ourk2.com] 3.2 URIBL_OB_SURBL Contains an URL listed in the OB SURBL blocklist [URIs: ourk2.com] 4.1 RCVD_DOUBLE_IP_SPAM Bulk email fingerprint (double IP) found 0.6 FORGED_OUTLOOK_HTML Outlook can't send HTML message only 2.4 MIME_HTML_ONLY_MULTI Multipart message only has text/html MIME parts 0.0 UPPERCASE_25_50 message body is 25-50% uppercase 0.0 MISSING_MIMEOLE Message has X-MSMail-Priority, but no X-MimeOLE 3.9 FORGED_MUA_OUTLOOK Forged mail pretending to be from MS Outlook Sat, 30 Apr 2005 19:45:21 KGST:80593: SA: yup, this smells like SPAM - hits=51.9 - rejecting message...
Re: SA + SQL + per-user prefs
Gerald V. Livingston II wrote: OK, this is probably just an over-cautious MySQL question. All of the examples I look at for setting up per-user prefs using SQL show creating a table that looks like: username pref value So, if I want to allow users to control 5 values I would have a table that looks like thsi: user1 pref1 value1 user1 pref2 value2 user1 pref3 value3 user1 pref4 value4 user1 pref5 value5 user2 pref1 value1 user2 pref2 value2 user2 pref3 value3 user2 pref4 value4 user2 pref5 value5 user3 . etc. When talking about importing a userbase of 6000+ that's gonna be a TALL table really fast. 30.000, 5 * 6.000, rows isn't a tall SQL table at all IMHO. Arvinn
system-wide AWL in SQL?
Hi. Is it possible to keep my system-wide AWL setup when using 'spamc -u recipient' with user preference and AWL stored in SQL? I currently have a system-wide AWL of more than a million rows that it would be a pity to loose. .. or should I switch to personal AWL's anyway? Arvinn
Re: Question about Bayes training - mozilla specifically
Bookworm wrote: I've read through the archives several times, and hoped that over the last year or so someone would build the functionality, or at least mention it one way or another - I haven't seen it. Is there any way to take an already trained Mozilla bayes structure and hand it directly off to SpamAssassin? For me, at least, that would eliminate almost all of the spam my server is receiving - Mozilla spots it instantly, but SpamAssassin is missing at least half. Troy Belding Bookworm Computing Mozilla stores its mail in mbox format, so you can simply use your good folders (one mbox each) for training HAM and your Junk folders for training SPAM. Just go and have a look in the file system, where Mozilla stores its files. mbox-files typically don't have an extension. Jo
Raising the score...
Hello, I have an old email address that a few contacts still use to reach me. I've tried to get everyone up to date on the new address but no luck. That's not really the issue though... The reason I changed addresses was that the spam that was coming in was all addressed to the old address. I see that SA has a concept of 'blacklist_to' but that will probably be overkill...right? If I set up whitelists for the people who I know...and who still use my old adress...and blacklist all other mail that is addressed to this address...will that work? Is there a better way...besides begging these contacts to finally update their address books? :) TIA, Kevin
RE: system-wide AWL in SQL?
Hi. Is it possible to keep my system-wide AWL setup when using 'spamc -u recipient' with user preference and AWL stored in SQL? Yes. Follow the instructions in the readme files. for user prefs and mysql see http://wiki.apache.org/spamassassin/UsingSQL Philipp
Observation on secondary MX
About a month ago, there was a discussion on the list about how spammers specifically target secondary MX records. After reading I verified that indeed 99% of the mail that flowed through my store-and-forward secondary mail server was spam. So, I removed the second MX record from my DNS zone, but did not actually decommission the server itself. The interesting thing is that now, about a month later, I'm still seeing spam going to that server! I wonder if the spammers have cached the old MX entry or if they have some database of mail server addresses and what domains they will accept email for.
Re: Observation on secondary MX
On 5/2/2005 1:48 PM +0200, Kevin Peuhkurinen wrote: spam going to that server! I wonder if the spammers have cached the old MX entry Jup. Niek
Web based helpdesk tool for SA?
Lately I've been thinking that something that would really be useful for SA is a web based helpdesk tool. The idea is to help out companies that use SA as a proxy in front of their Notes/Exchange/Groupwise servers (such as my own). The MTA on the SpamAssassin box would quarantine spam on the server. Then, when an end user complained about not getting an email, the internal helpdesk could use this tool to search through the quarantine for the false positive and have it delivered (and possibly even run through sa-learn --ham as well) with a click or two. I don't think that this would be hard to do but before I go dusting off my PHP for Dummies book, does anyone know if something like this already exists? Thanks, Kevin
Re: system-wide AWL in SQL?
Philipp Snizek wrote: Hi. Is it possible to keep my system-wide AWL setup when using 'spamc -u recipient' with user preference and AWL stored in SQL? Yes. Follow the instructions in the readme files. for user prefs and mysql see http://wiki.apache.org/spamassassin/UsingSQL Philipp Allready read it. System-wide AWL is not disucussed there, is it? Arvinn
RE: Web based helpdesk tool for SA?
-Original Message- From: Kevin Peuhkurinen [mailto:[EMAIL PROTECTED] Sent: Monday, May 02, 2005 7:59 AM To: users@spamassassin.apache.org Subject: Web based helpdesk tool for SA? Lately I've been thinking that something that would really be useful for SA is a web based helpdesk tool. The idea is to help out companies that use SA as a proxy in front of their Notes/Exchange/Groupwise servers (such as my own). The MTA on the SpamAssassin box would quarantine spam on the server. Then, when an end user complained about not getting an email, the internal helpdesk could use this tool to search through the quarantine for the false positive and have it delivered (and possibly even run through sa-learn --ham as well) with a click or two. I don't think that this would be hard to do but before I go dusting off my PHP for Dummies book, does anyone know if something like this already exists? IMHO, this is something that is needed. However, it would have to be seperate user based quarantines. Otherwise there would be privacys issues on one big quarantine that everyone could sift through. This does exhist. Many people have done it, an slapped a commercial price tag on it :) I was hoping someone would create it for SA in GPA lic. :) --Chris
RE: The highest score?
-Original Message- From: jdow [mailto:[EMAIL PROTECTED] Sent: Monday, May 02, 2005 1:41 AM To: users@spamassassin.apache.org Subject: Re: The highest score? I cheat. I have a couple personal rules guaranteed to hit spam and no ham whatsoever. They hit 100. MOM Agent is guaranteed spam. It seems to hit 200. So it's not fair. I have, however, seen over 100 with pure SARE rule sets so many of them were hit. {^_-} - Original Message - From: Roman Serbski [EMAIL PROTECTED] Hi all, What was the highest score you've ever seen? I received a message yesterday that was scored with 51.9(!). =) SA in action: ;-) Yeah, I'm running SARE, plus beta rules, plus all the URIBL lists (Including those not officially announced), and some personal rules. I think my avg spam score is something like 25 now. --Chris
Raising the score...
Hello, sorry for the repost...it ended up as a reply to something else...a SUE on my part... I have an old email address that a few contacts still use to reach me. I've tried to get everyone up to date on the new address but no luck. That's not really the issue though... The reason I changed addresses was that the spam that was coming in was all addressed to the old address. I see that SA has a concept of 'blacklist_to' but that will probably be overkill...right? If I set up whitelists for the people who I know...and who still use my old adress...and blacklist all other mail that is addressed to this address...will that work? Is there a better way...besides begging these contacts to finally update their address books? :) TIA, Kevin
Re: Web based helpdesk tool for SA?
Chris Santerre wrote: IMHO, this is something that is needed. However, it would have to be seperate user based quarantines. Otherwise there would be privacys issues on one big quarantine that everyone could sift through. I believe that most commercial anti-spam systems provide a means for administrators to look at a subset of the information about messages in the quarantine but not the text of the email itself. I agree that there is definately still a privacy issue with this, but there are all kinds of issues with allowing end users to manage their own personal quarantines as well.Corporations need to decide which issues are most important to them and then have the tools necessary to implement solutions that work for them. Therefore, a product that could allow both individual end-user access to quarantines as well as admin access to entire quarantines (but not message contents) would probably be of the greatest value. Someone pointed me to Mailwatch which looks like a good starting point but which is specifically tied to mailscanner. This hypothetical product would need to be modular in order to accomodate a range of configurations.
RE: INVALID_MSGID hitting improperly?
So, looking at: /GUID:QPywoUg6DZ06+yvqCupCVJw*/G=Cam/S=Dowlat/OU=Corporate-Markham/O=A lcate l Cable/PRMD=ACAB/ADMD=ATTMAIL/C=CA/@MHS -GUID:QnGodydG460CKmx35BCOvbw*-G=Cam-S=Dowlat-OU=Corporate-Markham-O=A lcate l Cable-PRMD=ACAB-ADMD=ATTMAIL-C=CA-@MHS Looking at the rule, I'm surprised they aren't BOTH declared invalid. [RFC quoting deleted on why a space isn't legal in msg-id] Ok, I buy that. And as another poster pointed out, they both were ruled that why for him. You see, we run SpamAssassin on our perimeter MTA so that we can reject messages that score a 10 or higher at SMTP time. While 5 or higher is marked as spam but still delivered. All the specific rules for rejected messages are logged, but not for accepted messages. I'd assumed that the past message log I looked at, since it wasn't even marked as spam, wouldn't have had enough of a negative score to overcome the 20 I'd put the INVALID_MSGID rule at. But I see that assumption must have been incorrect... [Different poster] BTW, why have *any* single rule scored at 20? Especially this one. To be able to not accept obvious spam at the perimeter, this machine is our incoming SMTP gateway. However, after it accepts a message for delivery, it still must pass the message off to our Internet firewall for delivery. The firewall, as configured from the vendor, has a rule to reject e-mail with invalid message id's. Assuming both would reject/accept identically for a given msg-id, it made sense to reject it right away, rather then accepting it for delivery and then having the firewall end up trying to delivery an NDA message to the sender. Which is what did occur frequently before I raised that rule to 20. However, it seems the rule on the firewall doesn't mind spaces in the msg-id, as it did let the message in once I restored the normal score to INVALID_MSGID. Which makes sense from a firewall perspective, I suppose. To them, they're not trying to prevent spam, but possible malicious headers which might cause internal e-mail machines to be compromised by such things as buffer overflows when processing the e-mail. In that light, it's hard to imagine a space character causing much of an issue with any MTA.
Re: system-wide AWL in SQL?
On Mon, May 02, 2005 at 02:54:31PM +0200, Arvinn Løkkebakken wrote: Yes. Follow the instructions in the readme files. for user prefs and mysql see http://wiki.apache.org/spamassassin/UsingSQL Allready read it. System-wide AWL is not disucussed there, is it? Probably not, there is a concept in 3.1 that allows you to do systemwide or groupwide AWL dbs in SQL, similar to how you can currently do it in Bayes (via override_username). Michael pgpC3jYIMT0tR.pgp Description: PGP signature
syntax error
Hi, I have just install the latest release of spamassassin (3.03 from the tarball) on a debian Everything seems to work fine but i 've got a syntax error during URIDNS test when i run spamd -D: ... debug: plugin: Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0x8496a28) implements 'parsed_metadata' debug: dns_available set to yes in config file, skipping test debug: decoding: no encoding detected debug: URIDNSBL: domains to query: debug: is Net::DNS::Resolver available? yes debug: Net::DNS version: 0.49 debug: plugin: Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0x8496a28) implements 'check_post_dnsbl' debug: running meta tests; score so far=2.553 Failed to run meta SpamAssassin tests, skipping some: syntax error at (eval 48) line 356, near ) { syntax error at (eval 48) line 365, near ; } debug: running header regexp tests; score so far=2.553 debug: running body-text per-line regexp tests; score so far=2.553 debug: running uri tests; score so far=2.553 debug: running raw-body-text per-line regexp tests; score so far=2.553 debug: running full-text regexp tests; score so far=2.553 debug: Running tests for priority: 1000 debug: running meta tests; score so far=2.553 debug: running header regexp tests; score so far=2.553 I don't know wich file to check. If someone would give me a hint, it will be appreciate. Thank in advance. Julien
Re: Question about Bayes training - mozilla specifically
Jo wrote: Bookworm wrote: I've read through the archives several times, and hoped that over the last year or so someone would build the functionality, or at least mention it one way or another - I haven't seen it. Is there any way to take an already trained Mozilla bayes structure and hand it directly off to SpamAssassin? For me, at least, that would eliminate almost all of the spam my server is receiving - Mozilla spots it instantly, but SpamAssassin is missing at least half. Troy Belding Bookworm Computing Mozilla stores its mail in mbox format, so you can simply use your good folders (one mbox each) for training HAM and your Junk folders for training SPAM. Just go and have a look in the file system, where Mozilla stores its files. mbox-files typically don't have an extension. Jo The issue is not so much that - I've dumped all my ham/spam through spamassassin - it's still not as good. The only thing I can see that's different is that Mozilla MUST have it's own bayes database that isn't dependant upon the actual email folders themselves. (I stopped storing all the junk mail when I reached about 15,000). I have no clue where that is, but I thought maybe someone here did, and knew how to convert it to something that spamassassin could use. Oh well - I'll try the mbox deal later. I only have about 80,000 emails I could process through.. Thanks! Troy
Re: Observation on secondary MX
... About a month ago, there was a discussion on the list about how spammers specifically target secondary MX records. After reading I verified that indeed 99% of the mail that flowed through my store-and-forward secondary mail server was spam. So, I removed the second MX record from my DNS zone, but did not actually decommission the server itself. The interesting thing is that now, about a month later, I'm still seeing spam going to that server! I wonder if the spammers have cached the old MX entry or if they have some database of mail server addresses and what domains they will accept email for. Yes and yes. I still receive (and trap as spam) email sent for domains I used to secondary for, but haven't in some cases for almost a year. They *must* keep databases, if not for the domains, at least for the email accounts themselves. (100% of the email sent to these domains and/or accounts is spam). Paul Shupak [EMAIL PROTECTED]
Re: Web based helpdesk tool for SA?
Kevin Peuhkurinen [EMAIL PROTECTED] wrote on 05/02/2005 06:59:01 AM: Lately I've been thinking that something that would really be useful for SA is a web based helpdesk tool. The idea is to help out companies that use SA as a proxy in front of their Notes/Exchange/Groupwise servers (such as my own). The MTA on the SpamAssassin box would quarantine spam on the server. Then, when an end user complained about not getting an email, the internal helpdesk could use this tool to search through the quarantine for the false positive and have it delivered (and possibly even run through sa-learn --ham as well) with a click or two. I don't think that this would be hard to do but before I go dusting off my PHP for Dummies book, does anyone know if something like this already exists? Thanks, Kevin Take a look at Maia Mailguard. Pretty slick tool, lots you can do with it. Users can control their own quarantine, and I believe an Admin can also do things for a user, it's been a while since I've looked at it so I don't recall all the features. Andy
RE: Blacklist Not Working
First, Thanks for the help. Craig noticed that the rule ALL_TRUSTED was matched. There was a potential issue with Trusted Path if trusted_networks was not configured. I tried that. The final mail server is Exchange, and I am having a hard time getting the headers back from the users. I posted a link to the Trusted Path issue in my response to Craig. Thanks again, Ron Shuck, CISSP, GCIA, CCSE - Managing Consultant Buchanan Associates - People. Process. Technology. -Original Message- From: Matt Kettler [mailto:[EMAIL PROTECTED] Sent: Friday, April 29, 2005 11:56 AM To: Ron Shuck Cc: Craig McLean; users@spamassassin.apache.org Subject: Re: Blacklist Not Working Ron Shuck wrote: Here is the log. I don't have the message, but as you can see it did not match the blacklist. ---log-- Apr 24 04:39:43 mail postfix/smtpd[25746]: connect from castile.calmra.com[72.11.146.117] Apr 24 04:39:44 mail postfix/smtpd[25746]: AE20883C: client=castile.calmra.com[72.11.146.117] Apr 24 04:39:45 mail postfix/cleanup[26437]: AE20883C: message-id=[EMAIL PROTECTED] Apr 24 04:39:45 mail postfix/qmgr[4304]: AE20883C: from=[EMAIL PROTECTED], size=2034, nrcpt=1 (queue active) Apr 24 04:39:45 mail spamd[14218]: connection from localhost.localdomain [127.0.0.1] at port 48918 Apr 24 04:39:45 mail spamd[14218]: info: setuid to filter succeeded Apr 24 04:39:45 mail spamd[14218]: processing message [EMAIL PROTECTED] for filter:501. Apr 24 04:39:46 mail spamd[14218]: clean message (4.8/5.0) for filter:501 in 1.2 seconds, 2000 bytes. Apr 24 04:39:46 mail spamd[14218]: result: . 4 - ALL_TRUSTED,AWL,BAYES_20,DNS_FROM_AHBL_RHSBL,HTML_50_60,HTML_IMAGE_ONLY _ 12,HTML_IMAGE_RATIO_02,HTML_MESSAGE,MIME_HTML_MOSTLY,MPART_ALT_DIFF,URI B L_OB_SURBL,URIBL_SBL,URIBL_WS_SURBL scantime=1.2,size=2000,mid=[EMAIL PROTECTED],bayes=0.062705367 0 923895,autolearn=no local.cf snippet blacklist_from [EMAIL PROTECTED] snip Ok, now what did the headers in the message look like? The from quoted in your logfile is the envelope, which might not have been present in the message at the time SA saw it. SA doesn't get the envelope directly, so that from is completely irrelevant unless your MTA or MDA inserted it into a Return-Path: header before SpamAssassin got called.
spamd log error
Hello, I am receiving the following errors every time mail is processed by spamd. Any ideas on a solution or what the problem is? Derril H May 2 08:04:52 admin2 spamd[19328]: Use of uninitialized value in hash element at /usr/lib/perl5/site_perl/5.8.1/Mail/SpamAssassin/Message/Metadata/Received.pm line 321, GEN2 line 542. May 2 08:04:52 admin2 spamd[19328]: Use of uninitialized value in hash element at /usr/lib/perl5/site_perl/5.8.1/Mail/SpamAssassin/Message/Metadata/Received.pm line 322, GEN2 line 542. May 2 08:04:52 admin2 spamd[19328]: Use of uninitialized value in hash element at /usr/lib/perl5/site_perl/5.8.1/Mail/SpamAssassin/Message/Metadata/Received.pm line 322, GEN2 line 542. May 2 08:04:52 admin2 spamd[19328]: Use of uninitialized value in hash element at /usr/lib/perl5/site_perl/5.8.1/Mail/SpamAssassin/Message/Metadata/Received.pm line 321, GEN2 line 542. May 2 08:04:52 admin2 spamd[19328]: Use of uninitialized value in hash element at /usr/lib/perl5/site_perl/5.8.1/Mail/SpamAssassin/Message/Metadata/Received.pm line 322, GEN2 line 542. May 2 08:04:52 admin2 spamd[19328]: Use of uninitialized value in hash element at /usr/lib/perl5/site_perl/5.8.1/Mail/SpamAssassin/Message/Metadata/Received.pm line 322, GEN2 line 542. May 2 08:04:52 admin2 spamd[19328]: Use of uninitialized value in pattern match (m//) at /usr/lib/perl5/site_perl/5.8.1/Mail/SpamAssassin/Message/Metadata/Received.pm line 210, GEN2 line 542. May 2 08:04:52 admin2 spamd[19328]: Use of uninitialized value in pattern match (m//) at /usr/lib/perl5/site_perl/5.8.1/Mail/SpamAssassin/Message/Metadata/Received.pm line 212, GEN2 line 542. May 2 08:04:52 admin2 spamd[19328]: Use of uninitialized value in concatenation (.) or string at /usr/lib/perl5/site_perl/5.8.1/Mail/SpamAssassin/Message/Metadata/Received.pm line 213, GEN2 line 542. May 2 08:04:53 admin2 spamd[19328]: error: Can't locate Net/DNS/RR/A.pm in @INC (@INC contains: ../lib /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/5.8.1/i386-linux-thread-multi /usr/lib/perl5/5.8.1 /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.1 /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl) at (eval 48) line 3, GEN2 line 542._ No such file or directory, continuing
Re: Upgrading to 3.0.3 - *CPAN* indexes stale?
Dan O'Brien wrote: Now that I'm trying to update my production server from 3.0.2 to 3.0.3 (since Friday eve), every CPAN mirror I try results in the following messages Going to read /root/.cpan/sources/modules/02packages.details.txt.gz Database was generated on Sat, 19 Mar 2005 21:41:38 GMT CPAN: HTTP::Date loaded ok Warning: This index file is 42 days old. Please check the host you chose as your CPAN mirror for staleness. I'll continue but problems seem likely to happen. CPAN then says that Mail::SpamAssassin [version 3.0.2] is up to date. Is CPAN acting goofy for anyone else? I was having the same problem last week BUT this morning, the CPAN mirrors I used were updated and I was able to upgrade via CPAN just fine. Try it again this week and see if it works for you now. :) Peac... Tom
Re: SA + SQL + per-user prefs
On Mon, 2005-05-02 at 09:34 +0200, Arvinn L?kebakken wrote: Gerald V. Livingston II wrote: OK, this is probably just an over-cautious MySQL question. All of the examples I look at for setting up per-user prefs using SQL show creating a table that looks like: username pref value So, if I want to allow users to control 5 values I would have a table that looks like thsi: user1 pref1 value1 user1 pref2 value2 user1 pref3 value3 user1 pref4 value4 user1 pref5 value5 user2 pref1 value1 user2 pref2 value2 user2 pref3 value3 user2 pref4 value4 user2 pref5 value5 user3 . etc. When talking about importing a userbase of 6000+ that's gonna be a TALL table really fast. 30.000, 5 * 6.000, rows isn't a tall SQL table at all IMHO. Nope, but think of how it would scale. The design above is bad because there is no unique data in there, so the table will get slow. A better design would be this: 1. A table with just users on there, each with their unique user ID, eg: UsersTable UID Friendlyname 1 bob 2 joe 2. A table for each preference, linked back by the UID in the first table: pref1Table UID Value 1 10 SA can then join the tables based on the UID, and the application only needs to be passed the UID to get all the values. You can also gain efficiencies with these smaller tables because you can optimise what fields are in there (eg on your SpamCutoffTable will only have integer and tinyint as field types). Your only problem would be perhaps passing to the application what values the user has got customised, but you could fix that up in two(four) ways which would alter the number of select statements needed: UsersPrefsTable UID Preferences 1 pref1, pref2, pref3 A different way of doing this is multiple fields with booleans: UsersPrefsTable UID pref1 pref2 pref3 1 1 1 0 Or you can build it into your original users table: UsersTable UID FriendlynamePreferences 1 bob pref1, pref2, pref3 The other way: UsersTable UID Friendlynamepref1 pref2 pref3 pref4 1 bob 1 0 0 1 I'm looking into integrating user prefs this quarter where I work, and I do have some concerns on how it will scale (e.g., with mysql replication you need to send writes to a different machine from reads if you need to have seperate databases, like one on each machine for reads and a master for writes). I wish more apps could be more db-aware :) Cheers Mike -- | Mike Grice Broadband Solutions for | Systems Engineer Home Business @ | PlusNet plc. www.plus.net + - PlusNet - The smarter way to broadband --
Re: SA + SQL + per-user prefs
On Mon, May 02, 2005 at 04:33:28PM +0100, Mike Grice wrote: Nope, but think of how it would scale. The design above is bad because there is no unique data in there, so the table will get slow. A better design would be this: Howdy, SpamAssassin is an open source project that welcomes contributions from the community. If you see a particular itch that you would like to scratch I highly encourage you to scratch it. Once you've got some working code feel free to post it here or on the wiki to get feedback from folks. If it's a widespread and useful feature then it may eventually make it's way into the source base. This is exactly how I got my start working with SpamAssassin, I wanted to be able to store the bayes and AWL data in SQL. I spent many many months working and perfecting the code and now it's in widespread use by many SpamAssassin users. Michael pgpRB2In8Tnn2.pgp Description: PGP signature
Re: SpamAssassin 3.0.3 Released
Hello, I'm the only one with problems checking the pgp sig of the tarball? BR, Matías.
Re: SpamAssassin 3.0.3 Released
On Mon, May 02, 2005 at 12:57:54PM -0300, Matias Lopez Bergero wrote: I'm the only one with problems checking the pgp sig of the tarball? Are you by chance using GPG 1.4.x? There is this note in the release announcement: Note: GnuPG 1.4.0, and possibly 1.3.x versions, seem to have problems verifying certain signature files, including the type as used for SpamAssassin releases. If you are running an affected version, please verify the code using both MD5 and SHA1 sum values instead. Michael pgpFCeMuWIl2G.pgp Description: PGP signature
Re: system-wide AWL in SQL?
Michael Parker wrote: Probably not, there is a concept in 3.1 that allows you to do systemwide or groupwide AWL dbs in SQL, similar to how you can currently do it in Bayes (via override_username). Michael Thanks. This shouldn't be all that much changes. Is there a patch for getting this in 3.0.3? But maybe I should concider per-user AWL anyway. Sound a little awkward to me though, as I have setup with system-wide bayes. What are the opinions? Arvinn
RE: regexp: exclude a string
-Original Message- From: wolfgang [mailto:[EMAIL PROTECTED] Sent: Sunday, May 01, 2005 2:03 PM To: users@spamassassin.apache.org Subject: Re: regexp: exclude a string In an older episode (Sunday 01 May 2005 12:49), Loren Wilton wrote: /p(?:0|o)rtf(?:0|o)(?:\||l)i(?:0|o)/ but not portfolio /(?!portfolio)p(?:0|o)rtf(?:0|o)(?:\||l)i(?:0|o)/ thanks, works fine. wolfgang Also I believe Loren has also told me to use this if you want to negate more then one: (?!portfolio|portfoil) But NOT to use: (?!p(ortfolio|ortfoil)) This has been your funky regex tip of the day :-) --Chris
Re: Reporting scams to fraudwatchinternational
John Andersen wrote: If you use a competent email client you will be offered the option of keeping a local copy, which saves the redundant recipient. Some people deliberately turn this off. I'm not sure why. (I can *sort* of understand it for mailing list mail, but not for direct mail.) Further, you should never assume that other recipients do not see BCCs. That it entirely up to the settings of the recipient's email client. If your MUA is actually adding a real header with BCC: information, it's broken. BCC isn't supposed to be a header in the usual sense; it's a way to tell your mail client to add extra SMTP RCPT TO: commands when sending the message. The recipients should NEVER see those extra recipients. The only way someone might find out about BCC'ed recipients is if they are the server admin (or have access to the mail logs) and are willing to spend the effort to wade through the logs tracking the message ID to see who got a copy. And that only applies in the case where the sender's SMTP server is also the destination; and partially applies if there are multiple recipients at a remote domain. If a remote domain only has one recipient in the list, they will NOT see any information regarding other recipients. -kgd -- Get your mouse off of there! You don't know where that email has been!
Re: SpamAssassin 3.0.3 Released
Michael Parker wrote: On Mon, May 02, 2005 at 12:57:54PM -0300, Matias Lopez Bergero wrote: I'm the only one with problems checking the pgp sig of the tarball? Are you by chance using GPG 1.4.x? There is this note in the release announcement: Note: GnuPG 1.4.0, and possibly 1.3.x versions, seem to have problems verifying certain signature files, including the type as used for SpamAssassin releases. If you are running an affected version, please verify the code using both MD5 and SHA1 sum values instead. Yes. I was using 1.4.0, I tried with 1.2.1 and got the OK :) Thanks, Michael BR, Matías.
Re: OT: The highest score?
Roman Serbski wrote: What was the highest score you've ever seen? I received a message yesterday that was scored with 51.9(!). =) Bah. I've seen a few that scored ~55 with stock 2.64 scores. With SpamCopURI, and custom scores, they jumped to ~80. I *think* I found one that scored ~80 on the stock 2.64 scores once, but I'm not certain. One weekend while I was particularly bored, I started putting together an uberspam that would trip as many stock 2.64 rules as possible. I got about a third of the way through the rules before stopping, and the score was pushing 300. g -kgd -- Get your mouse off of there! You don't know where that email has been!
Re: Observation on secondary MX
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Niek writes: On 5/2/2005 1:48 PM +0200, Kevin Peuhkurinen wrote: spam going to that server! I wonder if the spammers have cached the old MX entry Jup. BTW I've seen a few discussions recently where people rediscover (sorry Kevin) these behaviours. It might be worthwhile maintaining some kind of spammer tactics knowledge base, on the wiki maybe? - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFCdl2kMJF5cimLx9ARApA8AJ42elSJWP6Z5PI5VbhfcdwEns6TDACfbNai 0NyFJAgwR6XNjRA3nXWtVNA= =7rBI -END PGP SIGNATURE-
Re: system-wide AWL in SQL?
On Mon, May 02, 2005 at 06:09:32PM +0200, Arvinn Løkkebakken wrote: Thanks. This shouldn't be all that much changes. Is there a patch for getting this in 3.0.3? Search bugzilla, there was a review patch for the 3.0 tree but it never got enough votes to go in so I dropped it. I think it should still apply cleanly to 3.0.3. But maybe I should concider per-user AWL anyway. Sound a little awkward to me though, as I have setup with system-wide bayes. What are the opinions? I'm a firm believer in per-user dbs. It's pretty unlikely that anyone elses mailstream is going to exactly match yours, so you do your users a disservice in trying to make them all fit the same mold. Michael pgpR0wwmeTYhK.pgp Description: PGP signature
Re: Reporting scams to fraudwatchinternational
Kris Deugau said: If you use a competent email client you will be offered the option of keeping a local copy, which saves the redundant recipient. Some people deliberately turn this off. I'm not sure why. (I can *sort* of understand it for mailing list mail, but not for direct mail.) Further, you should never assume that other recipients do not see BCCs. That it entirely up to the settings of the recipient's email client. If your MUA is actually adding a real header with BCC: information, it's broken. BCC isn't supposed to be a header in the usual sense; it's a way to tell your mail client to add extra SMTP RCPT TO: commands when sending the message. The recipients should NEVER see those extra recipients. The only way someone might find out about BCC'ed recipients is if they are the server admin (or have access to the mail logs) and are willing to spend the effort to wade through the logs tracking the message ID to see who got a copy. And that only applies in the case where the sender's SMTP server is also the destination; and partially applies if there are multiple recipients at a remote domain. If a remote domain only has one recipient in the list, they will NOT see any information regarding other recipients. I've also seen broken mail servers that add headers based on the rcpt to: so you should assume that recipients bcc or not on the same remote server may be able to discover each other. But if you're confident your mail server/client isn't doing something stupid then there should be no way for [EMAIL PROTECTED] to discover the message was BCCed to [EMAIL PROTECTED] Jay -- Jay Lee Network / Systems Administrator Information Technology Dept. Philadelphia Biblical University --
Re: bayes problem
Payal Rathod wrote: Hi, I am looking after a friend's email server till he returns from his vacation. In his local.cf (SA 2.61 and yes I know it is time for upgrade) file he has, bayes_path /etc/mail/spamassassin/bayes use_bayes 1 score BAYES_50 0.001 Also bayes is well trained with, -rw---1 root root 5263360 May 2 01:58 bayes_seen -rw---1 root root 4210688 May 2 01:58 bayes_toks All the spam mails are forwared to an account 'spam'. Lately his users had started complaining that they received more spam than ever, so I checked his spam folder and grepped for bayes in headers. Surprisingly, out of 500 mails none showed bayes in headers. Does that mean bayes has stopped working? Almost certainly. Or, it might only be working for root. How is SA called? from procmail, or something else? One major problem I see is that the bayes files have permissions of 400, but the bayes DB is site-wide. You generally need to use bayes_file_mode 0777 when you specify a bayes_path in your local.cf. (If all users are to use the same bayes DB, they all must be able to read/write the files and have rwx to directories. Since these are deleted/recreated by SA constantly you can't just use chmod) If any non-root userID is used when invoking spamassassin, then the bayes DB will not be accessible. If he's using a MTA layer tool that always scans as root, this shouldn't be a problem. However, if he's letting the user's procmailrc call spamassassin or spamc this could be very troublesome. It's also trouble if his MTA layer tool deprivleges itself to a non-root userid. As for receiving more spam than ever. Well, you're using SA 2.61, which IS massively outdated. Spam is a moving target, and SpamAssassin does require reasonably frequent updates to keep abreast of changing trends. I'll admit I'm using 2.64, but I'm also using the Mail::SpamCopURI addon, and extensive custom rule tuning to keep up with it. Using an out-of-the box 2.61 setup, even with bayes, hitrate is going to suffer.
Re: bayes problem
Payal Rathod wrote: On Mon, May 02, 2005 at 02:11:19PM -0400, Matt Kettler wrote: How is SA called? from procmail, or something else? For .qmail file with a script ifspamh One major problem I see is that the bayes files have permissions of 400, but the bayes DB is site-wide. You generally need to use bayes_file_mode [...] Right. Do I need 777 or just 744? In general 777. All users that need to access the bayes DB need to be able to write to it, and create/delete temporary files and lock files. This happens most extensively in the event of opportunistic expiry or autolearning. In your case I might do 744, just because the box isn't yours and the admin might not want world-writable files (in which case he shouldn't be using a global bayes DB). However, 744 is really a half-baked solution and won't eliminate bayes problems. As for receiving more spam than ever. Well, you're using SA 2.61, which IS massively outdated. Spam is a moving target, and SpamAssassin does require reasonably frequent updates to keep abreast of changing trends. How safe is it to change to the new version? His is a live server and we don't want to risk anything at all. I wouldn't be doing extensive upgrades on a box you don't normally administer. However, you should let him know that all versions from 2.60 through 2.63 are vulnerable to a DoS attack if a person sends you a maliciously crafted email (it's a bug in the mime decoder which was fixed in 2.64, as well as 3.0.0)
Re: autolearn=ham
Robert Swan [EMAIL PROTECTED] wrote on 05/02/2005 02:15:45 PM: How do I clear, or unlearn the bayes filter it seems that it is picking up wrong. E-mail that is SPAM has autolearn=ham in the header and this is wrong. I am Running SPAMASSASSIN 3.0.3 on a Linux Red Hat 9 server. (just upgraded) did this in version 3.0.2 also, unrelated I know. Thanks in advance, Robert If it's a single message try: sa-learn --forget orginal.message.to.unlearn If on the other hand you want to clear out the entire bayes db because you think it's corrupted then use: sa-learn --clear man sa-learn for more info. Andy
Re: autolearn=ham
Robert Swan wrote: How do I clear, or unlearn the bayes filter it seems that it is picking up wrong. E-mail that is SPAM has autolearn=ham in the header and this is wrong. Is it? The autolearner uses the score the message would have gotten if bayes was disabled, all userconf (ie: white/blacklist) rules were disabled, and the AWL was disabled. Post a X-Spam-Status header for the message in question and we can give you some more specific advice, but just because the final score indicated spam it doesn't mean the autolearner can't decide it's ham. This is particularly true for message that got heavily hit on a blacklist or AWL rule. IMHO, the default ham learning threshold in current versions of SA is begging for problems like this. I keep mine set at a tiny negative score, but also have a collection of nonspam rules with tiny negative scores. This way, autolearning as ham must be earned by hitting one of the negative scoring rules, but the negative scoring rules can't be abused by spammers as they collectively add up to less than -1.0.
Re: OT: The highest score?
Roman Serbski wrote: Hi all, What was the highest score you've ever seen? I received a message yesterday that was scored with 51.9(!). =) I hate to say it, but I've seen scores over 1000.0. All you need to do is include a GTUBE :) USER_IN_BLACKLIST will also jack it up quite a bit with a +100 score. GTUBE and blacklists aside, my highest spam score in recent history (past 4 weeks) was 45.74: score=45.74, required 5, autolearn=spam, AB_URI_RBL 1.00, BAYES_99 5.40, DCC_CHECK 1.00, DRUGS_ERECTILE 1.00, HTML_70 _80 0.10, HTML_IMAGE_ONLY_04 1.00, HTML_MESSAGE 0.10, INFO_GREYLIST_NOTDELAYED -0.01, JP_URI_RBL 1.00, LOCAL_RCVD_HELO_XIP 1.50, MIME_HTML_ONLY 0.32, MIME_HTML_ONLY_MULTI 1.10, NO_DNS_FOR_FROM 1.65, OB_URI_RBL 2.10, RAZOR2_CF_RANGE_51_100 0.20, RAZOR2_CHECK 1.05, RCVD_IN_CHINA_KR 2.50, RCVD_IN_DSBL 0.71, RCVD_IN_NJABL_PROXY 2.34, RCVD_IN_SORBS_MISC 0.00, RCVD_IN_XBL 4.92, SARE_RAND_2V 1.50, SPAMCOP_URI_RBL 3.00, SUBJ_VIAGRA 4.10, VIAGRA_ONLINE 4.06, WS_URI_RBL 2.10, X_MESSAGE_INFO 2.00 But I tend to lean towards lowering rule scores from their defaults. I tend to find some SARE rules, etc are a bit overly aggressive in scoring for my tastes.
Re: autolearn=ham
Robert Swan wrote: How do I clear, or unlearn the bayes filter it seems that it is picking up wrong. E-mail that is SPAM has autolearn=ham in the header and this is wrong. I am Running SPAMASSASSIN 3.0.3 on a Linux Red Hat 9 server. (just upgraded) did this in version 3.0.2 also, unrelated I know. Thanks in advance, Robert Peace he would say instead of goodbyepeace my brother. Remove the bayes db. What are you using? File based? SQL based? Need more info about that. Also in your case, you may either A) turn off autolearn B) change thresholds for spam/ham so this is unlikely to happen again. -- Thanks, James
Re: autolearn=ham
Matt Kettler wrote: Robert Swan wrote: How do I clear, or unlearn the bayes filter it seems that it is picking up wrong. E-mail that is SPAM has autolearn=ham in the header and this is wrong. Is it? If it's spam being learned as ham, then yes, it is wrong. Autolearn may be doing what it's supposed to, but it's still a false negative. An expected one, but a misclassification nonetheless. Robert: just running sa-learn --spam will unlearn the message, then re-learn it as spam. -- Kelson Vibber SpeedGate Communications www.speed.net
RE: autolearn=ham
Hello all, I am using file based bayes DB and do Not have autolearn enabled, I do manual learning using IMAP Spam Begone. Robert Peace he would say instead of goodbyepeace my brother. -Original Message- From: James R [mailto:[EMAIL PROTECTED] Sent: Monday, May 02, 2005 3:24 PM To: users@spamassassin.apache.org Subject: Re: autolearn=ham Robert Swan wrote: How do I clear, or unlearn the bayes filter it seems that it is picking up wrong. E-mail that is SPAM has autolearn=ham in the header and this is wrong. I am Running SPAMASSASSIN 3.0.3 on a Linux Red Hat 9 server. (just upgraded) did this in version 3.0.2 also, unrelated I know. Thanks in advance, Robert Peace he would say instead of goodbyepeace my brother. Remove the bayes db. What are you using? File based? SQL based? Need more info about that. Also in your case, you may either A) turn off autolearn B) change thresholds for spam/ham so this is unlikely to happen again. -- Thanks, James
Re: Adding addresses to blacklist manually
Gregory P. Ennis wrote: Everyone, I installed 3.03 this afternoon and everything looks good. I finally decided to set up a user alias e-mail address to take advantage of the following command: spamassassin --add-to-blacklist /tmp/$FILENAME When I run this command from root I get a response of 1 message examined. I can not figure out where or what blacklist this command is adding the address; I would like to be able to check on the results in order to make sure it is working. Any help is appreciated. Thanks, Greg the --add-to-blacklist command only manipulates the AWL statistics for that sender. It biases that senders AWL statistics by pretending they sent a message that scored +100 and recording it in the AWL db. This effect is somewhat temporary, as over time the number of emails will reduce the impact this has. It's largely intended for correcting errors in the AWL, and not intended to be used as a real blacklist mechanism. There are no command line options that actually blacklist a sender with a static blacklist_from command. If you want to truly blacklist an address, you have to do it using a blacklist_from command in your /etc/mail/spamassassin/local.cf or similar config file.
Re: autolearn=ham
Kelson wrote: Matt Kettler wrote: Robert Swan wrote: How do I clear, or unlearn the bayes filter it seems that it is picking up wrong. E-mail that is SPAM has autolearn=ham in the header and this is wrong. Is it? If it's spam being learned as ham, then yes, it is wrong. Autolearn may be doing what it's supposed to, but it's still a false negative. An expected one, but a misclassification nonetheless. True. I mis-read Robert's message as implying that the SA autolearn mechanism was going haywire and randomly learning spam as ham for no clear reason. Hence my answer. Sorry for any confusion it may have created. (The rest of the message is generally correct, albeit topically misdirected. The facts about how the autolearner works in my message are correct, albeit some details are omitted for simplicity. Opinions about the threshold are my personal opinions, but they are my actual opinions.)
Re: Question about Bayes training - mozilla specifically
Bookworm wrote: I've read through the archives several times, and hoped that over the last year or so someone would build the functionality, or at least mention it one way or another - I haven't seen it. Is there any way to take an already trained Mozilla bayes structure and hand it directly off to SpamAssassin? For me, at least, that would eliminate almost all of the spam my server is receiving - Mozilla spots it instantly, but SpamAssassin is missing at least half. Here is a project that will export the Mozilla Bayes tokens which would at least be the first step. I'm not sure how hard it would be to then import them into SA. http://bayesjunktool.mozdev.org/
Re: OT: The highest score?
Roman Serbski wrote: What was the highest score you've ever seen? I received a message yesterday that was scored with 51.9(!). =) Unfortunately I just purged the spamtraps, but that's what log files are for. Here's the highest one from this week: Score: 63.173 BAYES_99 BIZ_TLD DOMAIN_RATIO FORGED_IMS_HTML FORGED_IMS_TAGS FORGED_MUA_IMS FORGED_YAHOO_RCVD FROM_ILLEGAL_CHARS HEAD_ILLEGAL_CHARS HTML_90_100 HTML_FORMACTION_MAILTO HTML_IMAGE_ONLY_20 HTML_IMAGE_RATIO_02 HTML_MESSAGE LOCAL_SURBL_MULTI MIME_HTML_ONLY MIME_HTML_ONLY_MULTI MISSING_MIMEOLE MPART_ALT_DIFF MSGID_SPAM_CAPS MSGID_YAHOO_CAPS RAZOR2_CF_RANGE_51_100 RAZOR2_CHECK RCVD_BY_IP RCVD_DOUBLE_IP_SPAM RCVD_HELO_IP_MISMATCH RCVD_IN_DSBL RCVD_IN_NJABL_PROXY RCVD_IN_NJABL_RELAY RCVD_IN_SORBS_HTTP RCVD_NUMERIC_HELO SUBJ_ILLEGAL_CHARS URIBL_OB_SURBL URIBL_SBL URIBL_WS_SURBL The only custom rule in there is LOCAL_SURBL_MULTI, which adds an extra 3 points if 3 or more SURBLs fire. So technically this should only have been 60.173. -- Kelson Vibber SpeedGate Communications www.speed.net
Re: Question about Bayes training - mozilla specifically
On Mon, May 02, 2005 at 03:44:25PM -0500, Stuart Johnston wrote: Bookworm wrote: I've read through the archives several times, and hoped that over the last year or so someone would build the functionality, or at least mention it one way or another - I haven't seen it. Is there any way to take an already trained Mozilla bayes structure and hand it directly off to SpamAssassin? For me, at least, that would eliminate almost all of the spam my server is receiving - Mozilla spots it instantly, but SpamAssassin is missing at least half. Here is a project that will export the Mozilla Bayes tokens which would at least be the first step. I'm not sure how hard it would be to then import them into SA. http://bayesjunktool.mozdev.org/ The bayes backup/restore format is fairly stable and it is pretty easy to create a restore file from alternate sources (that is one of the reasons it was written). It's possibly not documented as well as it should be, but no one has ever asked before so You will need the following bits of information: 1) The Raw Token (which needs to be turned into an SHA1 and then into a hex representation, which is probably too simple of an explanation for what is actually going on, so probably needs some more detail and maybe a helper function in the SA code for those that might want to attempt such a thing, not to mention a period in this sentence somewhere.) 2) The atime value for that token - SA bayes works off access times for tokens, so you need to know the last time it was useful, in a pinch you can use current time but it is not optimal. 3) The ham count for the token 4) The spam count for the token 5) Number of spam msgs learned 6) Number of ham msgs learned 7) List of msg ids and if they were learned as ham or spam (this can be optional but no optimal since it would allow for re-learning of msgs which could throw off your spam/ham counts) One you have all that, you throw it into a formatted restore file and then run sa-learn --restore and you are all set. If someone has a dump of one of these files, and it's got all the required information I'd be happy to take a look to see how feasible it would be. Michael pgpBqQpKyrzwv.pgp Description: PGP signature
Re: Question about Bayes training - mozilla specifically
Michael Parker wrote: On Mon, May 02, 2005 at 03:44:25PM -0500, Stuart Johnston wrote: Bookworm wrote: I've read through the archives several times, and hoped that over the last year or so someone would build the functionality, or at least mention it one way or another - I haven't seen it. Is there any way to take an already trained Mozilla bayes structure and hand it directly off to SpamAssassin? For me, at least, that would eliminate almost all of the spam my server is receiving - Mozilla spots it instantly, but SpamAssassin is missing at least half. Here is a project that will export the Mozilla Bayes tokens which would at least be the first step. I'm not sure how hard it would be to then import them into SA. http://bayesjunktool.mozdev.org/ The bayes backup/restore format is fairly stable and it is pretty easy to create a restore file from alternate sources (that is one of the reasons it was written). It's possibly not documented as well as it should be, but no one has ever asked before so You will need the following bits of information: 1) The Raw Token (which needs to be turned into an SHA1 and then into a hex representation, which is probably too simple of an explanation for what is actually going on, so probably needs some more detail and maybe a helper function in the SA code for those that might want to attempt such a thing, not to mention a period in this sentence somewhere.) 2) The atime value for that token - SA bayes works off access times for tokens, so you need to know the last time it was useful, in a pinch you can use current time but it is not optimal. 3) The ham count for the token 4) The spam count for the token 5) Number of spam msgs learned 6) Number of ham msgs learned 7) List of msg ids and if they were learned as ham or spam (this can be optional but no optimal since it would allow for re-learning of msgs which could throw off your spam/ham counts) One you have all that, you throw it into a formatted restore file and then run sa-learn --restore and you are all set. If someone has a dump of one of these files, and it's got all the required information I'd be happy to take a look to see how feasible it would be. There are some examples in XML format here: http://bayesjunktool.mozdev.org/installation.html Here's a sample: ?xml version=1.0 encoding=ISO-8859-1? !DOCTYPE tokenfile SYSTEM trainer_xml.dtdtokenfile good_msgs38/good_msgs bad_msgs320/bad_msgs token name$/name good4/good bad18/bad /token ... atimes and msgids are not included.
Re: My first rule
On May 2, 2005, at 4:39 PM, [EMAIL PROTECTED] wrote: Joe Kletch wrote: So excited--I created my first rule. Congratulations! It ran through lint with no errors and seems to be achieving the requested outcome: move messages from this sender to the recipients Spam folder, but do not reject (I have simscan rejecting scores over 18 points). The Threshold for moving to the Spam folder is 4.0 points. Could the list take a peak at this and make sure I didn't create a rule that will screw up everything from AOL or advise a better way to handle this. Thanks for your help. mail spamassassin $ cat move_to_spam.cf headerSLOWHND67 From:addr =~ /[EMAIL PROTECTED]/i describe SLOWHND67 Marissa's mail buddy into Spam folder score SLOWHND67 3.0 You're guilty of a breach-of-etiquette here - publishing an email address without permission. You anchor the string at the end with a $ - which is fine - but you don't anchor it at the beginning with a ^, which could help. If anyone ever emails Marissa from an address that ends in [EMAIL PROTECTED] - for example, [EMAIL PROTECTED] - this rule will false-positive. To fix, use this regexp: /[EMAIL PROTECTED]/i As an aside - does SpamAssassin allow things like lc From:addr eq '[EMAIL PROTECTED]' ? Probably not... Are you trying to silently discard all mail from this person? If so, I think a client-side rule would be most appropriate. That way you could claim plausible deniability if he found out. ;-) Sorry about the breach I should have known better. I'll consider this my first foray into regexp as well--been promising to get a handle on this for sometime now but alas can't seem to make it happen. Not trying to discard--just let the user deal with the hassle of finding her joke buddies email in the spam folder as a hint. Client is out of control in terms of the joke and personal emails from the CEO down and the IT Director is slowly trying to get some point across to the users in his own way. Joe Kletch