Re: please help, getting hammered with snowshoe spam
Rob McEwen wrote: >(2) ivmSIP/24 is attempting a very dangerous mission... which is to >preemptively block snowshoe spam by listing entire /24 blocks when >only a handful of IPs on that block have sent spam so far. But keep >in mind that (a) specifically--ivmSIP is going to block some spam >where that snowshoer hadn't sent from enough IPs to possibly be >listed (yet!) on ivmSIP/24 AND (b) The reason I call ivmSIP/24's >mission as "dangerous" is because there is a high risk of FPs >whereby spammers and legit senders share blocks of IPs within the >*same* /24 block. I've taken steps to greatly minimize that amount >of time that happens... but it is almost impossible to prevent this >altogether. Therefore, both medium-to-large ISPs and those who are >extremely concerned about FPs should use ivmSIP/24 for scoring >instead of blocking--in spite of my continued attempts to get >ivmSIP/24 to have just as few FPs as ivmSIP. (and I'm still working >on that!) Rob, yes, I'm with you there. :) I'm also sympathetic to your lawsuit concerns. There's abundant horror stories, here and elsewhere, about unskilled sysadmins improperly implementing an RBL, and outright blocking on DNS data that was meant to be ADVISORY only. However, the snowshoe problem has gotten so bad, I've started "labelling" all ranges of any host when I find enough "pure" snowshoe blocks in their space. I do NOT score on these merely "labelled" ranges, but use them as the equivalent of an SA "meta", in combination with the other tests I mentioned previously (i.e. on Barracuda, has an unsubscribe phrase, has a "teaser" phrase in From/Subject). I'm finding that is extremely effective, and some combos (reliable "teaser" + any other single test) have zero FPs (so far). My own IP-to-Nation data file (both real and hand-classified "virtual" nations) is only used by my own people (all somewhat cautiously screened). I write all the "base" rules, and we have a kick-butt FP pipeline, so I don't have to worry about a random user misunderstanding what a particular IP block classification is for. I can be far more aggressive than most. :) What I want to do is expand my merely-labelled IP ranges, and was hoping I could do a straight import of your /24 list into its own unique country code, then run some MassChecks, and see how that goes. Ideally, that should be helpful to both us and you. >(4) And I'm about to implement a large improvement to ivmSIP. I >found a bug in the programming (that had been there all along) which >was preventing some deserving IP from getting into ivmSIP. So ivmSIP >is about to get better. Therefore, substantial improvements are >about to happen to BOTH ivmSIP and ivmSIP.24 --therefore, I'd prefer >that any publicly available stats/testing be done in a week or two >from now--AFTER these improvements are made. I understand about you wanting to review your data first, so no pressure. :) I would be happy to do a non-published "quick" look if you like, then send you any FPing-IPs I see, and wait until you're happy with your own data before I shared any public results. >(5) regarding the "shared hosting environment"... if ALL of these >mail servers resolve their queries using the *same* locally hosted >DNS server for resolving queries, then there is only need for a >single setup of the lists, for that one DNS server--and then there'd >be a single price based on the cumulative total number of >mailboxes--and, therefore, many quantity discounts would apply (or, >am I not understanding you? Aren't these all hosted at the same >physical location?... or multiple datacenters owned by the same >company?) I should have been clearer: I am _NOT_ a sysadmin/mailadmin. I'm justaprogrammer. :) About five years ago, a volunteer written filter at the main host I was using, broke. I ended up fixing it, which started me down the path of filter programming. :) Initially, my goal was merely to "fill in the holes" that exist in a shared hosting environment (where SA's full potential is limited by the need to target the lowest-common-denominator). It turned into a much larger project when I realized the data analysis potential of hand-classified data from a diverse group of smallish domains. All my volunteers grasp that they're helping each other, and are very motivated & enthusiastic. The project is still rather small (about forty domains, with about half a million spams per month), however it's a nice size and quality for doing serious research. :) We're split among several different hosts, so the only way it would be viable to use your lists in real-time, would be to set up our own DNS server, only known to project members. Since most of us are only receiving a trickle of snowshoe spam, that's not viable at this time. The ones who receive more than a trickle, receive a FLOOD. As I mentioned, in some cases 80% of their FNs are from snowshoers. - "Chip"
Re: please help, getting hammered with snowshoe spam
While reading the "html picture spam" thread, it occurred to me to check the sizes of Ham hitting Barracuda. The largest one was 113,351 bytes. I then checked the nation-of-origin for all Barracuda hitting "large" spams (msg size >= 256 kb), and (during the 3-week period I checked) only 4 out of 190 were from non-snowshoe IP ranges. Actually, it was a bit more, but a quick review of them resulted in me moving a few into my snowshoe "virtual" nations. :) I've just added that as an extra test (i.e. on Barracuda plus "large" message size), currently scored at the equivalent of about 1 SA point. I forgot to mention another combo test: if it's on both Barracuda and the Day-Old-Bread list, I add the equivalent of about 1 SA point. Zero FPs so far. I'll review all those scores and tests in a few more weeks. - "Chip"
Re: please help, getting hammered with snowshoe spam
Chip M. wrote: > *** Rob McEwen: *** > Would you be willing to provide your /24 list, for even a short period, > in some sort of plain text format (maybe one CIDR per line?), so those > of us with good hand-classified corpi could try out your data? > > Most of my users are in a shared hosting environment, so they can't use > your list suite as-is. Based on what reliable people have posted, some > of my users should probably benefit from your /24 list. I'd be very > glad to provide you with a list of any FPs I find. :) Chip, Here are some thoughts: (1) if you are discussing hostkarma and barracuda's lists, then ivmSIP is probably a more equivalent list to compare to rather than ivmSIP/24. And they both work together VERY well for blocking snowshoe spam. Moreover, I contend that the combination of my three lists (ivmSIP, ivmSIP/24, and ivmURI), working together (and even if using ivmSIP/24 in scoring mode), is the best and most cost effective solution specifically for blocking hard-to-catch for snowshoe spam. (2) ivmSIP/24 is attempting a very dangerous mission... which is to preemptively block snowshoe spam by listing entire /24 blocks when only a handful of IPs on that block have sent spam so far. But keep in mind that (a) specifically--ivmSIP is going to block some spam where that snowshoer hadn't sent from enough IPs to possibly be listed (yet!) on ivmSIP/24 AND (b) The reason I call ivmSIP/24's mission as "dangerous" is because there is a high risk of FPs whereby spammers and legit senders share blocks of IPs within the *same* /24 block. I've taken steps to greatly minimize that amount of time that happens... but it is almost impossible to prevent this altogether. Therefore, both medium-to-large ISPs and those who are extremely concerned about FPs should use ivmSIP/24 for scoring instead of blocking--in spite of my continued attempts to get ivmSIP/24 to have just as few FPs as ivmSIP. (and I'm still working on that!) (3) Along these lines, I'm just about to make substantial changes to ivmSIP/24--so that (a) in many cases, it will list subranges instead of the whole /24 list and (b) that way, when I'm forced with a decision about removing an ivmSIP/24 listing so as to not hurt an innocent sender sharing a block with an egregious spammer.. I can then "have my cake and eat it to"--I can avoid more innocent IPs... but then NOT have to give the spammers a pass by delisting the whole /24 block--as I'm sometimes having to do now. (I often use this to a put pressure on hosters to remove the spammers FIRST--but I can only do so much of that--playing that game take tremendous time and resources--and has large lawsuit risks!) (4) And I'm about to implement a large improvement to ivmSIP. I found a bug in the programming (that had been there all along) which was preventing some deserving IP from getting into ivmSIP. So ivmSIP is about to get better. Therefore, substantial improvements are about to happen to BOTH ivmSIP and ivmSIP.24 --therefore, I'd prefer that any publicly available stats/testing be done in a week or two from now--AFTER these improvements are made. (5) regarding the "shared hosting environment"... if ALL of these mail servers resolve their queries using the *same* locally hosted DNS server for resolving queries, then there is only need for a single setup of the lists, for that one DNS server--and then there'd be a single price based on the cumulative total number of mailboxes--and, therefore, many quantity discounts would apply (or, am I not understanding you? Aren't these all hosted at the same physical location?... or multiple datacenters owned by the same company?) -- Rob McEwen http://dnsbl.invaluement.com/ r...@invaluement.com +1 (478) 475-9032
Re: please help, getting hammered with snowshoe spam
This snowshoe stuff has been a PITA for a while. For most of my users (particularly the Geeks), it's not even on their radar. For others, (inluding my most complex domain), 80% of their FNs are from snowshoers. As well as the usual battery of anti-spam tests, I'm using a layered/meta approach of tests: 1. "teaser" header word checks (see below) 2. sender IP checking against large hosts that have been known to host snowshoers (hand-maintained) 3. unsubscribe phrase(s) in the body 4. Barracuda If you look at several snowshoe samples, you'll note that the "From" and/or "Subject" pretty much ALWAYS contain some sort of "teaser" word(s). Those are the two headers that are (always?) displayed to the potential victim, so the spammer has a strong incentive to continue using those to try to lure in the victim. They're a VERY good target for new rules. I've broken these "teasers" down into three general groups (and score accordingly): A. specific product names (e.g. "pedi paws") which are high-quality/low-risk spam signs B. generic product names (e.g. "green tea") which are medium-quality/medium-risk spam signs C. general terms (e.g. many variations on "insurance") which are medium-quality/higher-risk spam signs I've never had an FP on the first group, and they're really easy to spot and add to my rules. I've even begun pre-emptively listing anything I notice while watching TV. The Weather Channel is particularly useful for that. :) The last group is the tricky one, and pretty much has to be used in metas with the other rule groups listed above. I regularly update my list of "active" snowshoe IP ranges, which catches most of these. That's my single most time intensive non-coding task, in all of my anti-spam work. I've gotten to the point where, if I notice more than a few /24s in any one webhost's IP space, I re-classify _ALL_ of their blocks with a generally non-scoring code, then use that as a meta at run-time. The main problem is that I need more data to expand these. Anything which is sent from any of those IP blocks, then gets a HUGE bonus if there is either a weak "teaser" and/or an unsubscribe term in the body. I'm planning to add another meta bonus rule for anything that's on Barracuda. I've found that HostKarma's blocklist is about as efficacious as Barracuda, however I've experienced some timeouts, and some hinky whitelist results, so I'm only using it in my FP pipeline, where it has been extremely useful (Mark, if you're reading this, I'd be very happy to send you more details and any specific data that would be helpful to you - feel free to contact me off-list). Some snowshoers have started putting the unsub link in a GIF, so I'll be adding some rules for that, soonish. *** Rob McEwen: *** Would you be willing to provide your /24 list, for even a short period, in some sort of plain text format (maybe one CIDR per line?), so those of us with good hand-classified corpi could try out your data? Most of my users are in a shared hosting environment, so they can't use your list suite as-is. Based on what reliable people have posted, some of my users should probably benefit from your /24 list. I'd be very glad to provide you with a list of any FPs I find. :) Contact me off-list, if you'd prefer.
Re: please help, getting hammered with snowshoe spam
Dennis Hardy wrote: >Do people generally have good non-FP experience with BRBL? I am >thinking of bumping up the score, but I get so much spam per day >it is hard to check for FPs with it enabled. Dennis, it depends on what sort of ham your people receive. For evaluation purposes, I've been running Barracuda on three diverse domains (one Geek, one "pure business", one mixed business&family). Each maintains a decent-to-excellent hand-classified corpus (they share summary data with me). The Geek domain had 2 Barracuda FPs (between 28-Sep-2008 and today). Both were from the same IP, so I've merely skip listed it. Unfortunately, that particular sender was a webhost, with one of the FPs being critically time-sensitive, so I consider those FPs to be completely unacceptable (albeit easily avoided). Both of those emails received an SA score of "0.0", so the mentioned score of "3.0" would NOT have stopped them (that particular webhost is extremely Geeky, and doesn't commit any HTML atrocities). The "pure business" domain had a zero Barracuda FP rate (note it's only been running since 24-Jan-2009). The other domain (running Barracuda since 16-Jan-2009) receives a LOT of requested mailing list traffic (Constant Contact, Cheetah, etc), and has had a significant number of FPs. Here are the number of Barracuda hits for the last two weeks, for the domain with FPs: spam 5005 ham38 Of the ham IPs, 22 had been previously classified as (generally) legit bulk mailers (i.e. "ESP"s). Visual inspection of the rest showed that _ALL_ were some sort of mailing list, mostly business oriented, the rest charitable or social. When I sorted that data by SA score, it was uniformly distributed across the different types of senders. Here is the breakdown of SA scores for the ham: SA range Hits Percent 25.3% 0.0 - 0.67 18.4% 1.0 - 1.8 13 34.2% 2.0 - 2.437.9% 3.0 - 3.57 18.4% 4.0 - 4.26 15.8% During that same period, there were 16 hams that had SA scores above the cutoff threshold. If I had scored Barracuda at "3.0", the potential FPs would have doubled. Note that I am not currently running Barracuda via SA (I'm doing the testing in a different filter which runs right after SA). Bottom-line: Depending on the nature of your ham, you are likely to get some FPs, even at the mentioned score of "3.0". If you have a weak FP pipeline, then be very cautious. Consider scoring Barracuda weakly, and using it in a "meta" context. If anyone wants it, I can dump the specific SA tests for those FPs, as well as a separate list of the spam hits (should be useful for creating meta rules). I will also update those stats in a few more weeks/months.
RE: please help, getting hammered with snowshoe spam
> Do people generally have good non-FP experience with BRBL? I am > thinking of > bumping up the score, but I get so much spam per day it is hard to > check for > FPs with it enabled. It seems like a great resource, will it be pushed > out > with "sa-update" soon? I believe it is enabled in svn, from what I've > read. > On one of the systems we run we set it to 0.1 initially to see how it went. After three months monitoring we upped it to 3.0. and have never had any problems. However you have to take this in the context of the other settings and mail throughput for this particular system: A tagging score of 4 and a drop score of 12 (yes, this is a bit high), on roughly 4000 emails per day (after zen.spamhause.org dnsbl blocking). Faris.
Re: please help, getting hammered with snowshoe spam
Yes, it has been a problem as there are so many domains used. However..I took everyone's earlier suggestions, including training Bayes against FN snowshoe spam and adding the Barracuda RBL (BRBL), and this appears to almost completely take care of the problem!! So far I have been able to remove all of my custom rules except for BRBL of course, and only a few of these snowshoe spams get through now. Nice! Do people generally have good non-FP experience with BRBL? I am thinking of bumping up the score, but I get so much spam per day it is hard to check for FPs with it enabled. It seems like a great resource, will it be pushed out with "sa-update" soon? I believe it is enabled in svn, from what I've read. Also I am using policyd-weight to do front-end greylisting if the DNSBL checks trigger as this reduces load on the server. Can anyone suggest how to enable the BRBL in policyd-weight? I'm not sure what values to use. Again thank you for your help with this problem! It is great to see SA working so well now against it :-) -- View this message in context: http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21792616.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: please help, getting hammered with snowshoe spam
Karsten Bräckelmann wrote on Fri, 30 Jan 2009 20:25:52 +0100: > Dennis clearly stated a *week* ago that the "domains change too > quickly" (actual quote). Getting them listed will not help him. Oh, and > don't you think he would have created a trivial uri rule already, if > that would get them caught? Obviously they are caught for others ;-) Either by Bayes, rules, network checks or other measure. It's never a "one hits them all" solution, so adding a spam domain to uribl is always good. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com
Re: please help, getting hammered with snowshoe spam
On Fri, 2009-01-30 at 18:28 +0100, Benny Pedersen wrote: > On Fri, January 23, 2009 17:36, Dennis Hardy wrote: > > > Yes already done: http://pastebin.com/m4400a74d > > why not get it listed on http://uribl.com/ ? Benny, this is going to help how? Dennis clearly stated a *week* ago that the "domains change too quickly" (actual quote). Getting them listed will not help him. Oh, and don't you think he would have created a trivial uri rule already, if that would get them caught? -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: please help, getting hammered with snowshoe spam
Benny Pedersen wrote: > On Fri, January 23, 2009 17:36, Dennis Hardy wrote: > >> Yes already done: http://pastebin.com/m4400a74d >> > why not get it listed on http://uribl.com/? > Both uribl and ivmURI listed this domain back on January 23rd. But it is unclear exactly *when* this spam sample was sent because the person who started this thread didn't include full headers. So it is unclear if the message hit this guy's server before these two URI blacklists listed that domain? or after? (I'm guessing after?) -- Rob McEwen http://dnsbl.invaluement.com/ r...@invaluement.com +1 (478) 475-9032
Re: please help, getting hammered with snowshoe spam
On Fri, January 23, 2009 17:36, Dennis Hardy wrote: > Yes already done: http://pastebin.com/m4400a74d why not get it listed on http://uribl.com/ ? -- http://localhost/ 100% uptime and 100% mirrored :)
Re: please help, getting hammered with snowshoe spam
Dennis Hardy a écrit : >> Is this spam for snowshoes or some "spam term"? > > "Like a snowshoe spreads the load of a traveler across a wide area of snow, > some spammers use many frequently-changing IP addresses and domains to > spread out the spam load in order to dilute recipient reputation metrics and > evade filters." > > see http://www.spamhaus.org/faq/answers.lasso?section=Glossary#233 > >> If the former, put some example up on a pastebin (not ehre!). > > Yes already done: http://pastebin.com/m4400a74d you need to show full headers. there are generally patterns in the envelope sender and in few headers. Also, consider using BRBL: header RCVD_IN_BRBL eval:check_rbl('brbl-lastexternal', 'bb.barracudacentral.org.') describe RCVD_IN_BRBL Received via a relay in Barracuda BRBL tflags RCVD_IN_BRBL net scoreRCVD_IN_BRBL 3.0 adjust the score of course.
Re: please help, getting hammered with snowshoe spam
Everyone has given very helpful feedback! At present it definitely sounds like I should tweak my rules and train my bayes. I will try taking steps here and see how it goes. Thank you all so very much! -- View this message in context: http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21631249.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: please help, getting hammered with snowshoe spam
Dennis Hardy wrote: Hi, I'm getting hammered by snowshoe spam :-( Any thoughts/advice are appreciated :-) When this started happening to us the only solution I found was manual CIDR blocks. Yea I know very last millennium but I didn't find anything else to work with. Some particular snowshoers had patterns I could use but it seemed the addresses under attack were rapidly passed out among a large number of different outfits each with different styles. Bayes did not help sadly. Derek
Re: please help, getting hammered with snowshoe spam
Dennis Hardy wrote on Fri, 23 Jan 2009 08:36:59 -0800 (PST): > see http://www.spamhaus.org/faq/answers.lasso?section=Glossary#233 Ah. I know a lot of spam terms, but this is certainly new to me ;-) > > > If the former, put some example up on a pastebin (not ehre!). > > Yes already done: http://pastebin.com/m4400a74d As it doesn't contain any headers I don't know if I wouldn't have rejected it at MTA, anyway. I get: X-Spam-Report: * 5.0 BAYES_99 BODY: Bayesian spam probability is 99 to 100% * [score: 1.] * 3.0 URIBL_BLACK Contains an URL listed in the URIBL blacklist * [URIs: twolumpsofcoal.net] * 0.1 DIET_1 BODY: Lose Weight Spam It may not have been in URIBL_BLACK at the time you got it. But there are two other good rules that hit on it. As you are getting BAYES_05 there's something wrong with your Bayes I'd say. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com
Re: please help, getting hammered with snowshoe spam
On Fri, 23 Jan 2009, Dennis Hardy wrote: Here is what I have been using (from previous help from this mail list!): uri SSS_URI30 /\bhttp:\/\/[^\.\/]+\.(?i:com|net|info|biz)\/\w{30}\b/ uri SSS_URI30 1.5 this uri rule does work very well. but they change the length sometimes, so I have a few rules that handle different lengths. Maybe I should use 29,31 instead of just 30 for example? Am I being too conservative? Should I consider bumping the score of this up more? And my meta up more perhaps? Again, I'd have to see more examples to comment meaningfully. I would be especially interested in whether or not the part after the domain name is indeed free from punctuation. A long string of unpunctuated letters is less likely to FP than a long string of letters, numbers and underscores. You might want to anchor your rule with a $ as it may FP if there is stuff in the URI following the string of gibberish. Try it against this very legitimate looking (if overly verbose) URI: http://fnord.com/retrieve_document_as_pdf3_file.php?123456 And the rule I suggested makes an attempt to detect gibberish by looking for a "q" that is not followed by a "u", which is rare in English words. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Vista: because the audio experience is *far* more important than network throughput. --- 4 days until Wolfgang Amadeus Mozart's 253rd Birthday
Re: please help, getting hammered with snowshoe spam
> your BAYES is misfiring. Ths difference between BAYES_05 and BAYES_99 is 4.6 > so you could have score of 5.7 if you'd have well-trained BAYES. Yes, that would be great. I will look at trying this. I do get tens of thousands of e-mails a day through this system though so it is hard to do manual processes. I need to play conservative and can't afford FPs at all... -- View this message in context: http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21628480.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: please help, getting hammered with snowshoe spam
> Can you repost that with full headers? Yes, I have to wait for more to come through though as I have gotten into the habit of just deleting the FNs. > No DNSBL hits on the URI domain? No, the domains change too quickly, so I almost never get DNSBL hits for these. I have DNSBL greylisting front-ending SA as well, and I get no hits there either. It is really annoying. Usually someone will submit and URIBL_BLACK will hit after a few though. I've added a meta for the URL check (below) and URIBL_BLACK and DCC_CHECK, maybe all I really need to do is bump up the meta score for this combination? > We'd need more than one sample URI to do a good job. Have you been > collecting a corpus? Not of a FN set. I should collect this. > I notice that this URI has a format that may be a good spam sign: the > domain name, followed by a long string of unpunctuated text gibberish. Here is what I have been using (from previous help from this mail list!): uri SSS_URI30 /\bhttp:\/\/[^\.\/]+\.(?i:com|net|info|biz)\/\w{30}\b/ uri SSS_URI30 1.5 this uri rule does work very well. but they change the length sometimes, so I have a few rules that handle different lengths. Maybe I should use 29,31 instead of just 30 for example? Am I being too conservative? Should I consider bumping the score of this up more? And my meta up more perhaps? -- View this message in context: http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21628431.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: please help, getting hammered with snowshoe spam
> > why are those scores low? What gives them negative score? > > those rules have quite high score... On 23.01.09 08:26, Dennis Hardy wrote: > Here is an example (without my rules): http://pastebin.com/m4400a74d X-Spam-Status: No, score=1.1 required=5.0 tests=BAYES_05,DCC_CHECK,DIET_1, SPF_HELO_PASS,SPF_PASS autolearn=no version=3.2.5 your BAYES is misfiring. Ths difference between BAYES_05 and BAYES_99 is 4.6 so you could have score of 5.7 if you'd have well-trained BAYES. > The ones that get through are relatively short and simple, and many are very > "clean". This example is just one that focuses on weight loss, some are > regarding tea or satellite companies or coffee makers or the like. I worry > about increasing FPs of real e-mails by training of "clean" spams as spam, > when they are short and sweet and many times look like they could be > legitimate e-mails. just train on them, and remember to train on clean mails (especially those which will start getting higher BAYES score). > Also would training bayes on this sort of e-mail help if many things are > different between each e-mail, and if the e-mail is so short and relatively > "clean"? Addresses change, company names change, sender domains are always > different, etc Iv you trained with enough of mail, it would help. However the result says similar mails were trasined as ham, which is what you should investigate and fix. on some mailboxes I keep trained ham/spam in special folders so I could whenever re-train or forget if anything was incorrect. > I've been thinking about maybe writing an SA plugin that counts the three > repeated URL patterns that are always present in all of these spams, but I > don't know where to start in trying to do that. I was hoping I could just > handle this with SA rules or something (like using another RBL or > something). more mails could give an idea what should be hit. Maybe a rule would be enough, not needed to create a plugin. But I'm sure BAYES training should be enough for this mail... -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Support bacteria - they're the only culture some people have.
Re: please help, getting hammered with snowshoe spam
On Fri, 23 Jan 2009, Dennis Hardy wrote: why are those scores low? What gives them negative score? those rules have quite high score... Here is an example (without my rules): http://pastebin.com/m4400a74d Can you repost that with full headers? The ones that get through are relatively short and simple, and many are very "clean". No DNSBL hits on the URI domain? I've been thinking about maybe writing an SA plugin that counts the three repeated URL patterns that are always present in all of these spams, but I don't know where to start in trying to do that. We'd need more than one sample URI to do a good job. Have you been collecting a corpus? I notice that this URI has a format that may be a good spam sign: the domain name, followed by a long string of unpunctuated text gibberish. Just off the top of my head and untested, how does this do against your corpus? uri GIBBERISH ;://[^/]{4,50}/(?=[a-z]{25,80}$)[a-z]{0,80}q[^u][a-z]{0,80}$;i -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Gun Control is nothing more than an attempt to return to feudalism, where the peasants are helpless and must humbly petition their lord and master to protect them from bandits and thieves (when they can get around to it), and where the lords and masters can abuse the peasants whenever they like without fear of effective resistance. --- 4 days until Wolfgang Amadeus Mozart's 253rd Birthday
Re: please help, getting hammered with snowshoe spam
> I've been using this rule to knock some of these down: > [...] > Highly unusual to have a url like that in ham... > I'm running a meta to bump up the score... Yes, I've actually been doing the very same thing (URI detection and metas, and then string matching in the tail part of the e-mail) ! However it has been getting tedious maintaining the string list manually, because the " Marketing" and " Media" etc. targets and addresses have been changing far more frequently now. They'll use them for a few days, then disappear completely, and new ones will appear. This type of spam is so incredibly a pain... Is there some more general way that this sort of thing could be handled? -- View this message in context: http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21628143.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: please help, getting hammered with snowshoe spam
On Fri, 2009-01-23 at 07:56 -0800, Dennis Hardy wrote: > Hi, I'm getting hammered by snowshoe spam :-( I've added rules to try to > catch common formats of included URLs in the spam, but I'm wary of scoring > these rules too high because of the potential for false positives. It's > hard to come up with other rules as the spam e-mail content is so generic. > By default these spams score incredibly low (bayes, etc.) In many cases, > the low bayes values are scoring negative, which completely offsets the few > positive scoring rules that I have added. I've been using this rule to knock some of these down: uri AE_ASM /\/[[:alpha:]]{28,40}$/ describe AE_ASM long gibberish path used by ASM Marketing score AE_ASM1 Highly unusual to have a url like that in ham... I'm running a meta to bump up the score... -- Daniel J McDonald, CCIE #2495, CISSP #78281, CNX Austin Energy http://www.austinenergy.com
Re: please help, getting hammered with snowshoe spam
> Is this spam for snowshoes or some "spam term"? "Like a snowshoe spreads the load of a traveler across a wide area of snow, some spammers use many frequently-changing IP addresses and domains to spread out the spam load in order to dilute recipient reputation metrics and evade filters." see http://www.spamhaus.org/faq/answers.lasso?section=Glossary#233 > If the former, put some example up on a pastebin (not ehre!). Yes already done: http://pastebin.com/m4400a74d -- View this message in context: http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21627984.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: please help, getting hammered with snowshoe spam
Dennis Hardy wrote on Fri, 23 Jan 2009 07:56:44 -0800 (PST): > Hi, I'm getting hammered by snowshoe spam Is this spam for snowshoes or some "spam term"? If the former, put some example up on a pastebin (not ehre!). Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com
Re: please help, getting hammered with snowshoe spam
> why are those scores low? What gives them negative score? > those rules have quite high score... Here is an example (without my rules): http://pastebin.com/m4400a74d The ones that get through are relatively short and simple, and many are very "clean". This example is just one that focuses on weight loss, some are regarding tea or satellite companies or coffee makers or the like. I worry about increasing FPs of real e-mails by training of "clean" spams as spam, when they are short and sweet and many times look like they could be legitimate e-mails. Also would training bayes on this sort of e-mail help if many things are different between each e-mail, and if the e-mail is so short and relatively "clean"? Addresses change, company names change, sender domains are always different, etc I've been thinking about maybe writing an SA plugin that counts the three repeated URL patterns that are always present in all of these spams, but I don't know where to start in trying to do that. I was hoping I could just handle this with SA rules or something (like using another RBL or something). Thank you! -- View this message in context: http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21627664.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: please help, getting hammered with snowshoe spam
On 23.01.09 07:56, Dennis Hardy wrote: > Hi, I'm getting hammered by snowshoe spam :-( I've added rules to try to > catch common formats of included URLs in the spam, but I'm wary of scoring > these rules too high because of the potential for false positives. It's > hard to come up with other rules as the spam e-mail content is so generic. > By default these spams score incredibly low (bayes, etc.) In many cases, > the low bayes values are scoring negative, which completely offsets the few > positive scoring rules that I have added. train bayes properly, it's the first thing you should do for such mail. > Are there other RBLs or domain checks or something that could be used to > possibly get more indication that a spam is a snowshoe spam from a "bogus" > domain? I've also added a meta rule that combines URIBL_BLACK, DCC_CHECK, > and my rules...but spam still gets by many times because it scores so > low/negative otherwise. Maybe I just need to score everything higher...? why are those scores low? What gives them negative score? those rules have quite high score... -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Quantum mechanics: The dreams stuff is made of.
please help, getting hammered with snowshoe spam
Hi, I'm getting hammered by snowshoe spam :-( I've added rules to try to catch common formats of included URLs in the spam, but I'm wary of scoring these rules too high because of the potential for false positives. It's hard to come up with other rules as the spam e-mail content is so generic. By default these spams score incredibly low (bayes, etc.) In many cases, the low bayes values are scoring negative, which completely offsets the few positive scoring rules that I have added. Are there other RBLs or domain checks or something that could be used to possibly get more indication that a spam is a snowshoe spam from a "bogus" domain? I've also added a meta rule that combines URIBL_BLACK, DCC_CHECK, and my rules...but spam still gets by many times because it scores so low/negative otherwise. Maybe I just need to score everything higher...? Any thoughts/advice are appreciated :-) -- View this message in context: http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21627042.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.