Re: Regular expression help
--On Wednesday, January 21, 2009 1:04 AM + rje...@vzw.blackberry.net wrote: I am attempting to create a regular expression to give a negative score for purchase orders. I need to match the following: PO PO: PO# P.O. P.O. # PO # I have not been able to get this to work correctly. I have the following: /\bP\.?O\.?[:#]? [#]?/i /P\.?O/ Expect it to match things besides purchase orders, but they will be false negatives. Joseph Brennan
Re: please help, getting hammered with snowshoe spam
Everyone has given very helpful feedback! At present it definitely sounds like I should tweak my rules and train my bayes. I will try taking steps here and see how it goes. Thank you all so very much! -- View this message in context: http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21631249.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: please help, getting hammered with snowshoe spam
Dennis Hardy wrote: Hi, I'm getting hammered by snowshoe spam :-( Any thoughts/advice are appreciated :-) When this started happening to us the only solution I found was manual CIDR blocks. Yea I know very last millennium but I didn't find anything else to work with. Some particular snowshoers had patterns I could use but it seemed the addresses under attack were rapidly passed out among a large number of different outfits each with different styles. Bayes did not help sadly. Derek
Re: Zero exit-code after SIGPIPE
On Fri, 23 Jan 2009, RW wrote: I'm having a problem whereby Spamassassin is sometimes being killed by SIGPIPE before it's written-out the email to stdout, and then returns a zero exit-code. Ouch. Open a bug in the bugzilla. While the devs may read this list, they don't use it for taking bug reports. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- We should endeavour to teach our children to be gun-proof rather than trying to design guns to be child-proof --- 4 days until the 42nd anniversary of the loss of Apollo 1
Re: please help, getting hammered with snowshoe spam
Dennis Hardy wrote on Fri, 23 Jan 2009 08:36:59 -0800 (PST): > see http://www.spamhaus.org/faq/answers.lasso?section=Glossary#233 Ah. I know a lot of spam terms, but this is certainly new to me ;-) > > > If the former, put some example up on a pastebin (not ehre!). > > Yes already done: http://pastebin.com/m4400a74d As it doesn't contain any headers I don't know if I wouldn't have rejected it at MTA, anyway. I get: X-Spam-Report: * 5.0 BAYES_99 BODY: Bayesian spam probability is 99 to 100% * [score: 1.] * 3.0 URIBL_BLACK Contains an URL listed in the URIBL blacklist * [URIs: twolumpsofcoal.net] * 0.1 DIET_1 BODY: Lose Weight Spam It may not have been in URIBL_BLACK at the time you got it. But there are two other good rules that hit on it. As you are getting BAYES_05 there's something wrong with your Bayes I'd say. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com
Zero exit-code after SIGPIPE
I'm having a problem whereby Spamassassin is sometimes being killed by SIGPIPE before it's written-out the email to stdout, and then returns a zero exit-code. Whilst I'd be keen to eliminate the SIGPIPE problem, the more important problem is the return of the zero exit-code, because it turns delayed mail into lost mail. Getmail treats it as success, delivers an empty email with just it's own headers, and deletes the original off the server. If I hack Spamassassin's default signal handler to return 1 then everything works correctly, the delivery is aborted, the email gets left on the pop/imap server, and gets delivered on the next check. Arguably, it might be pragmatic for the handler to return 0 (not spam) if the "-e" option is used, but if it isn't then it should return a non-zero error code. I'm using SpamAssassin 3.2.5 on FreeBSD 7.1, and using the spamassassin script directly with no command-line options (i.e. without the -e option)
Re: please help, getting hammered with snowshoe spam
On Fri, 23 Jan 2009, Dennis Hardy wrote: Here is what I have been using (from previous help from this mail list!): uri SSS_URI30 /\bhttp:\/\/[^\.\/]+\.(?i:com|net|info|biz)\/\w{30}\b/ uri SSS_URI30 1.5 this uri rule does work very well. but they change the length sometimes, so I have a few rules that handle different lengths. Maybe I should use 29,31 instead of just 30 for example? Am I being too conservative? Should I consider bumping the score of this up more? And my meta up more perhaps? Again, I'd have to see more examples to comment meaningfully. I would be especially interested in whether or not the part after the domain name is indeed free from punctuation. A long string of unpunctuated letters is less likely to FP than a long string of letters, numbers and underscores. You might want to anchor your rule with a $ as it may FP if there is stuff in the URI following the string of gibberish. Try it against this very legitimate looking (if overly verbose) URI: http://fnord.com/retrieve_document_as_pdf3_file.php?123456 And the rule I suggested makes an attempt to detect gibberish by looking for a "q" that is not followed by a "u", which is rare in English words. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Vista: because the audio experience is *far* more important than network throughput. --- 4 days until Wolfgang Amadeus Mozart's 253rd Birthday
Re: please help, getting hammered with snowshoe spam
> your BAYES is misfiring. Ths difference between BAYES_05 and BAYES_99 is 4.6 > so you could have score of 5.7 if you'd have well-trained BAYES. Yes, that would be great. I will look at trying this. I do get tens of thousands of e-mails a day through this system though so it is hard to do manual processes. I need to play conservative and can't afford FPs at all... -- View this message in context: http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21628480.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: please help, getting hammered with snowshoe spam
> Can you repost that with full headers? Yes, I have to wait for more to come through though as I have gotten into the habit of just deleting the FNs. > No DNSBL hits on the URI domain? No, the domains change too quickly, so I almost never get DNSBL hits for these. I have DNSBL greylisting front-ending SA as well, and I get no hits there either. It is really annoying. Usually someone will submit and URIBL_BLACK will hit after a few though. I've added a meta for the URL check (below) and URIBL_BLACK and DCC_CHECK, maybe all I really need to do is bump up the meta score for this combination? > We'd need more than one sample URI to do a good job. Have you been > collecting a corpus? Not of a FN set. I should collect this. > I notice that this URI has a format that may be a good spam sign: the > domain name, followed by a long string of unpunctuated text gibberish. Here is what I have been using (from previous help from this mail list!): uri SSS_URI30 /\bhttp:\/\/[^\.\/]+\.(?i:com|net|info|biz)\/\w{30}\b/ uri SSS_URI30 1.5 this uri rule does work very well. but they change the length sometimes, so I have a few rules that handle different lengths. Maybe I should use 29,31 instead of just 30 for example? Am I being too conservative? Should I consider bumping the score of this up more? And my meta up more perhaps? -- View this message in context: http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21628431.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: excessive scan time
On 22-Jan-2009, at 13:57, Brian J. Murrell wrote: Now users need to know how to edit SQL records, or I need to install a web interface for that. The ROI here for that is just not high enough. Really? A webface to edit user configuration options in an SQL database is trivial. I know its trivial because *I* can do it. -- "Whose motorcycle is this?" "It's chopper, baby." "Whose chopper is this?" "It's Zed's." "Who's Zed?" "Zed' dead, baby. Zed's dead."
Re: please help, getting hammered with snowshoe spam
> > why are those scores low? What gives them negative score? > > those rules have quite high score... On 23.01.09 08:26, Dennis Hardy wrote: > Here is an example (without my rules): http://pastebin.com/m4400a74d X-Spam-Status: No, score=1.1 required=5.0 tests=BAYES_05,DCC_CHECK,DIET_1, SPF_HELO_PASS,SPF_PASS autolearn=no version=3.2.5 your BAYES is misfiring. Ths difference between BAYES_05 and BAYES_99 is 4.6 so you could have score of 5.7 if you'd have well-trained BAYES. > The ones that get through are relatively short and simple, and many are very > "clean". This example is just one that focuses on weight loss, some are > regarding tea or satellite companies or coffee makers or the like. I worry > about increasing FPs of real e-mails by training of "clean" spams as spam, > when they are short and sweet and many times look like they could be > legitimate e-mails. just train on them, and remember to train on clean mails (especially those which will start getting higher BAYES score). > Also would training bayes on this sort of e-mail help if many things are > different between each e-mail, and if the e-mail is so short and relatively > "clean"? Addresses change, company names change, sender domains are always > different, etc Iv you trained with enough of mail, it would help. However the result says similar mails were trasined as ham, which is what you should investigate and fix. on some mailboxes I keep trained ham/spam in special folders so I could whenever re-train or forget if anything was incorrect. > I've been thinking about maybe writing an SA plugin that counts the three > repeated URL patterns that are always present in all of these spams, but I > don't know where to start in trying to do that. I was hoping I could just > handle this with SA rules or something (like using another RBL or > something). more mails could give an idea what should be hit. Maybe a rule would be enough, not needed to create a plugin. But I'm sure BAYES training should be enough for this mail... -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Support bacteria - they're the only culture some people have.
Re: please help, getting hammered with snowshoe spam
On Fri, 23 Jan 2009, Dennis Hardy wrote: why are those scores low? What gives them negative score? those rules have quite high score... Here is an example (without my rules): http://pastebin.com/m4400a74d Can you repost that with full headers? The ones that get through are relatively short and simple, and many are very "clean". No DNSBL hits on the URI domain? I've been thinking about maybe writing an SA plugin that counts the three repeated URL patterns that are always present in all of these spams, but I don't know where to start in trying to do that. We'd need more than one sample URI to do a good job. Have you been collecting a corpus? I notice that this URI has a format that may be a good spam sign: the domain name, followed by a long string of unpunctuated text gibberish. Just off the top of my head and untested, how does this do against your corpus? uri GIBBERISH ;://[^/]{4,50}/(?=[a-z]{25,80}$)[a-z]{0,80}q[^u][a-z]{0,80}$;i -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Gun Control is nothing more than an attempt to return to feudalism, where the peasants are helpless and must humbly petition their lord and master to protect them from bandits and thieves (when they can get around to it), and where the lords and masters can abuse the peasants whenever they like without fear of effective resistance. --- 4 days until Wolfgang Amadeus Mozart's 253rd Birthday
Re: please help, getting hammered with snowshoe spam
> I've been using this rule to knock some of these down: > [...] > Highly unusual to have a url like that in ham... > I'm running a meta to bump up the score... Yes, I've actually been doing the very same thing (URI detection and metas, and then string matching in the tail part of the e-mail) ! However it has been getting tedious maintaining the string list manually, because the " Marketing" and " Media" etc. targets and addresses have been changing far more frequently now. They'll use them for a few days, then disappear completely, and new ones will appear. This type of spam is so incredibly a pain... Is there some more general way that this sort of thing could be handled? -- View this message in context: http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21628143.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: please help, getting hammered with snowshoe spam
On Fri, 2009-01-23 at 07:56 -0800, Dennis Hardy wrote: > Hi, I'm getting hammered by snowshoe spam :-( I've added rules to try to > catch common formats of included URLs in the spam, but I'm wary of scoring > these rules too high because of the potential for false positives. It's > hard to come up with other rules as the spam e-mail content is so generic. > By default these spams score incredibly low (bayes, etc.) In many cases, > the low bayes values are scoring negative, which completely offsets the few > positive scoring rules that I have added. I've been using this rule to knock some of these down: uri AE_ASM /\/[[:alpha:]]{28,40}$/ describe AE_ASM long gibberish path used by ASM Marketing score AE_ASM1 Highly unusual to have a url like that in ham... I'm running a meta to bump up the score... -- Daniel J McDonald, CCIE #2495, CISSP #78281, CNX Austin Energy http://www.austinenergy.com
Re: please help, getting hammered with snowshoe spam
> Is this spam for snowshoes or some "spam term"? "Like a snowshoe spreads the load of a traveler across a wide area of snow, some spammers use many frequently-changing IP addresses and domains to spread out the spam load in order to dilute recipient reputation metrics and evade filters." see http://www.spamhaus.org/faq/answers.lasso?section=Glossary#233 > If the former, put some example up on a pastebin (not ehre!). Yes already done: http://pastebin.com/m4400a74d -- View this message in context: http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21627984.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: please help, getting hammered with snowshoe spam
Dennis Hardy wrote on Fri, 23 Jan 2009 07:56:44 -0800 (PST): > Hi, I'm getting hammered by snowshoe spam Is this spam for snowshoes or some "spam term"? If the former, put some example up on a pastebin (not ehre!). Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com
Re: please help, getting hammered with snowshoe spam
> why are those scores low? What gives them negative score? > those rules have quite high score... Here is an example (without my rules): http://pastebin.com/m4400a74d The ones that get through are relatively short and simple, and many are very "clean". This example is just one that focuses on weight loss, some are regarding tea or satellite companies or coffee makers or the like. I worry about increasing FPs of real e-mails by training of "clean" spams as spam, when they are short and sweet and many times look like they could be legitimate e-mails. Also would training bayes on this sort of e-mail help if many things are different between each e-mail, and if the e-mail is so short and relatively "clean"? Addresses change, company names change, sender domains are always different, etc I've been thinking about maybe writing an SA plugin that counts the three repeated URL patterns that are always present in all of these spams, but I don't know where to start in trying to do that. I was hoping I could just handle this with SA rules or something (like using another RBL or something). Thank you! -- View this message in context: http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21627664.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: please help, getting hammered with snowshoe spam
On 23.01.09 07:56, Dennis Hardy wrote: > Hi, I'm getting hammered by snowshoe spam :-( I've added rules to try to > catch common formats of included URLs in the spam, but I'm wary of scoring > these rules too high because of the potential for false positives. It's > hard to come up with other rules as the spam e-mail content is so generic. > By default these spams score incredibly low (bayes, etc.) In many cases, > the low bayes values are scoring negative, which completely offsets the few > positive scoring rules that I have added. train bayes properly, it's the first thing you should do for such mail. > Are there other RBLs or domain checks or something that could be used to > possibly get more indication that a spam is a snowshoe spam from a "bogus" > domain? I've also added a meta rule that combines URIBL_BLACK, DCC_CHECK, > and my rules...but spam still gets by many times because it scores so > low/negative otherwise. Maybe I just need to score everything higher...? why are those scores low? What gives them negative score? those rules have quite high score... -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Quantum mechanics: The dreams stuff is made of.
Re: training for spamassassin
> Ralf Heidenreich wrote: > > sa-learn coaches spamassassin. On 23.01.09 10:45, Bowie Bailey wrote: > Actually, sa-learn coaches the Bayes db if you want to be specific. I prefer word "train" instead of "coach" :-) > > Is it better, to coach spamassassin with mails, that are not examined > > through spamassassin. Also original spam-mails. > > If spamassassin examines mails, and writes a Spam-Status flag into the > > header, can these mails used for sa-learn? > > It doesn't matter. Train with everything you have. sa-learn will > automatically remove any SA headers. It will also automatically skip > any messages that have been previously learned, so you don't have to > worry about learning the same email twice. However, most important is training on false-positives and false-negatives, then on mail that hadn't been classified with high or low spam probablility, especially mail that had too high (ham) or low (spam) score. Yes, even training on BAYES_00 and BAYES_99 can give some advantages, but if you don't have much time, fucos on those I described above -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. My mind is like a steel trap - rusty and illegal in 37 states.
please help, getting hammered with snowshoe spam
Hi, I'm getting hammered by snowshoe spam :-( I've added rules to try to catch common formats of included URLs in the spam, but I'm wary of scoring these rules too high because of the potential for false positives. It's hard to come up with other rules as the spam e-mail content is so generic. By default these spams score incredibly low (bayes, etc.) In many cases, the low bayes values are scoring negative, which completely offsets the few positive scoring rules that I have added. Are there other RBLs or domain checks or something that could be used to possibly get more indication that a spam is a snowshoe spam from a "bogus" domain? I've also added a meta rule that combines URIBL_BLACK, DCC_CHECK, and my rules...but spam still gets by many times because it scores so low/negative otherwise. Maybe I just need to score everything higher...? Any thoughts/advice are appreciated :-) -- View this message in context: http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21627042.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: experienced comments on these rules and their effectiveness in large installations please
On 22.01.09 14:54, RobertH wrote: > would those of you in the know please comment based upon your data re: the > below rules and their effectiveness in hitting spam vrs ham and/or false > readings in diverse or fairly diverse large scale isp and/or corporate > installations please I think they all have scores set according to thei effectiveness and FP-rate. Note that _DUL and _PBL only apply on last external IP, thus it's important to have trusted_networks and internal_networks set up correctly. > RCVD_IN_BL_SPAMCOP_NET > RCVD_IN_DSBL this one is obsolete (dsbl is dead) . run sa-update to get fresh versions of all rules. > RCVD_IN_NJABL_CGI > RCVD_IN_NJABL_MULTI > RCVD_IN_NJABL_PROXY > RCVD_IN_NJABL_RELAY > RCVD_IN_NJABL_SPAM > RCVD_IN_SBL > RCVD_IN_SORBS_BLOCK > RCVD_IN_SORBS_DUL > RCVD_IN_SORBS_HTTP > RCVD_IN_SORBS_MISC > RCVD_IN_SORBS_SMTP > RCVD_IN_SORBS_SOCKS > RCVD_IN_SORBS_WEB > RCVD_IN_SORBS_ZOMBIE > RCVD_IN_XBL > RCVD_IN_PBL > DNS_FROM_AHBL_RHSBL > RCVD_IN_MAPS_RBL > RCVD_IN_MAPS_DUL > RCVD_IN_MAPS_RSS > RCVD_IN_MAPS_NML These are non-standard and I don't know if anyone published some investigations and measurements here. Try searching... > we are just wondering if we should use them or not, and specifically which > ones are the best, if any at all. I happily use them as they are distributed with SA (and sa-update). I don't use MAPS*. So does our company (only I'm maintaining SA for last months) -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. If Barbie is so popular, why do you have to buy her friends?
Re: excessive scan time
Brian J. Murrell wrote: I'd also suggest using SQL for user preferences. The user interface (i.e. editing a file) for user preferences is a different story. Now users need to know how to edit SQL records, or I need to install a web interface for that. Or you use a small script that reads the users preferences from file (when the file has been modified) and updates the SQL database. Regards /Jonas -- Jonas Eckerman, FSDB & Fruktträdet http://whatever.frukt.org/ http://www.fsdb.org/ http://www.frukt.org/
RE: training for spamassassin
Ralf Heidenreich wrote: > Hello, > > sa-learn coaches spamassassin. Actually, sa-learn coaches the Bayes db if you want to be specific. > Is it better, to coach spamassassin with mails, that are not examined > through spamassassin. Also original spam-mails. > If spamassassin examines mails, and writes a Spam-Status flag into the > header, can these mails used for sa-learn? It doesn't matter. Train with everything you have. sa-learn will automatically remove any SA headers. It will also automatically skip any messages that have been previously learned, so you don't have to worry about learning the same email twice. > greetings Ralf -- Bowie
training for spamassassin
Hello, sa-learn coaches spamassassin. Is it better, to coach spamassassin with mails, that are not examined through spamassassin. Also original spam-mails. If spamassassin examines mails, and writes a Spam-Status flag into the header, can these mails used for sa-learn? greetings Ralf
Re: experienced comments on these rules and their effectiveness in large installations please
RobertH wrote on Thu, 22 Jan 2009 14:54:41 -0800: > would those of you in the know please comment based upon your data re: the > below rules and their effectiveness in hitting spam vrs ham and/or false > readings in diverse or fairly diverse large scale isp and/or corporate > installations please fairly easy. run one week with default settings and one week with "skip_rbl_checks 1". Then compare. In general, these rules will provide hits if you don't use RBLs at MTA level. If you use RBLs to reject at MTA level they won't hit much. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com