Fuzzy OCR annoying Outlook users
Hey, I'm using FuzzyOCR which works great. However, lately I've been seeing annoying Outlook users using some kind of plugin which seem to add an image, and it has the text Free emoticons, download here (or something), mostly it's in my language and then it has the text gratis. The word gratis gets mached by FuzzyOCR and the mail gets an extra score of 5. So I tried adding the hash of this image: # ./fuzzy-find --delete imstp_pets_cat1_du.gif # ./fuzzy-find --learn-ham --score=0 imstp_pets_cat1_du.gif However, when I scan the mail again, I'm still getting a score of 5: 5.0 FUZZY_OCR_KNOWN_HASH BODY: Mail contains an image with known hash Words found: gratis in 1 lines gratis in 1 lines Any idea's to learn FuzzyOCR not to tag this image as spam? Thanks! K.
Re: Increase of spam?
On Thu, 3 May 2007, Jerry Durand wrote: All DSL/dialup accounts get a 554 from us (using a couple of RBLs), so I've actually seen our spam decrease lately. I've used RBLs too, in the past. However, i've noticed legitimate mailservers sometimes turn up in such lists so we were missing mails, and there were quite a lot of complaints. I tried to put in less restrictive RBLs, but in the end I had to remove them. Now I'm thinking to enhance my greylisting to check RBLs, and if the IP is found in an RBL, to increase the greylisting time... K.
Re: Justa a small nag from 3.2.0...
On Fri, 4 May 2007, Matt Kettler wrote: This apparently is fixed in perl 5.8.8, but still happens in 5.8.6, 5.8.5, etc. Hm, I have a Slackware 11.0 box with perl 5.8.8 and I'm getting the same message. This problem also was there already with the previous version of spamassassin and FuzzyOcr, however, FuzzyOcr works fine, there are no warnings during execution, only when running spamassassin --lint K.
Re: Increase of spam?
On Fri, 4 May 2007, Andrzej Adam Filip wrote: You can use gray-listing to avoid blind spot (detection delay) of such lists to increase their efficiency. Yes, this is what I will try to archive in the future. Two standard questions to clear the picture: a) Do you block dynamic ip addresses at MTA level? b) Do you block free email services? No. I just use grey-listing on every host (except from IP's from my country). K.
Re: KAM.cf ham
On Wed, 2 May 2007, Henrik Krohns wrote: I guess this doesn't hurt, but Bayes should already handle it. Most mails on my server are BAYES_00, since there is practically no spam in our language. Well, I don't entirely agree. In theory bayes can handle things ofcourse, but I have words in my mail that would never occur in spam, mails with a phonenumber from my country almost never occurs in spam... I want to give such mail a more negative weight, to finetune things, and I can see it works great: even mails that are written in caps etc, but which aren't spam, aren't tagged falsely. Also the opposite is true, but it is more obvious: most of my spam has BAYES_99, so in theory all spam can be handled by BAYES_99. But ofcourse, people write specific rules too to push the spam over the limit so it gets flagged. If Bayes would be fully perfect and if it would handle everything, SARE rules wouldn't be needed ;-) K.
Increase of spam?
Hi list, Not sure if it's entirely on-topic, but at least I want to monitor it closely. A while ago I implemented graylisting, which works quite well. But since 2 days ago I'm seeing loads of mails which are passing by the greylisting (so they are being sent again by a real mailserver). Anybody knows if there is a new windows virus on the loose that retries to deliver mails? The mails are coming from all kinds of hosts, all kinds of countries but mostly from dialup or adsl accounts (so, not hijacked corporate mailservers). Thanks! K.
whitelist_from_rcvd to train bayesdb ?
Hi, Although I have some negative-score rules, my ham mails never score too much below zero. I've set auto learning for ham to -12 to be sure spam never gets marked as ham and my bayes database doesn't get polluted- i think it's quite bad if ham mail would be autolearned as spam (i guess much more worse than the other way around). Anyway, i've been thinking to use whitelist_from_rcvd to mark mail from certain providers (which i never saw spam from if it came from the right mailserver) with a low score so that my database also gets trained with more ham. So for example: whitelist_from_rcvd [EMAIL PROTECTED] isp-sending-domain Is this a good idea, or am i abusing the whitelist_from_rcvd rule and am I missing something so this will it have a bad impact in the end? Thanks! K.
Re: KAUF-TIPP DER WOCHE spam getting through
On Wed, 28 Mar 2007, Panagiotis Christias wrote: the last days we get a lot of spam like this: KAUF-TIPP DER WOCHE I wrote a few of my own rules especially to catch those stocks scams together with bayes. If you don't have any people who should write you in German you can also use the X-Languages tag to boost the score if the mail is written in German. Here are my current rules, which should also catch the German stocks. Maybe there are some false positives in a real stock environment, but for me they work fine: body __HILO_STOCKS1 /(High|Low|Curr[e3]nt|Cur(r|\r.|r[e3]nt|\.)\ P(ric[e3])?|Pric[e3]|Last)[\:\ \t]+\$[\d\ ]+?(.*)(Last|Low|Growth|Grow||High|Sale|Pric[e3]|Vol|[E3]xp)[\:\ \t]+/i body __HILO_STOCKS2 /curr[e3]n[t7](ly)?[\ \t\_]+?\:[\ \t\_\$]+?\d/i body __HILO_STOCKS2 /[e3](x|ks)p[e3]ct[e3]d?[\ \t\_]+?\:[\ \t\_\$]+?\d/i body __HILO_STOCKS3 /our[\ \t\_]+?(last[\ ]+?)?pick[\:\ \t\_\;\=\,]/i body __HILO_STOCKS4 /\d[\ \t\_]+?(c[e3]nt|dollar|[e3]ur|p[e3]nc[e3])/i body __HILO_STOCKS5 /(c[e3]nt|dollar|[e3]ur[o]?|p[e3]nc[e3])[\ \t\_]+?\d/ibody __HILO_STOCKS9 /(hot[\ \t\_]+?list|r[e3]cord|publicity\ |n[e3]ws\ |invest|incr[e3]as[e3]|[e3]xplosion|high\ |pr[e3]mium|mark[e3]t|al[e3]rt|sym[b8]ol|the\ rush|your\ radar|g[e3]t\ [i1]n|schluss\-?stand|prognose|kauf\-?tip)/i meta HILO_STOCKS ( ( __HILO_STOCKS1 || __HILO_STOCKS2 || __HILO_STOCKS3 || __HILO_STOCKS4 || __HILO_STOCKS5 ) __HILO_STOCKS9 ) describe HILO_STOCKS Looks like stocks scam score HILO_STOCKS 3.0
Re: bayes effectiveness dropping with use of greylisting?
On Tue, 20 Mar 2007, Erik Slooff wrote: I have an interesting observation on my mail gateway (policyd for greylisting, postfix, amavisd-new and spamassassin); after implementing greylisting and other measures such as RBLs there aren't enough spam messages coming through to keep bayes trained. Hey, I did not have this problem (yet), I've just implemented greylisting maybe 3 weeks ago. Though it's an interesting problem, I assume I can have the same once. The solution seems simple to me, I guess. Just exclude a few dummy addresses (take common names) from your greylisting rules, those addresses will catch all the new spam and will train your database. I'm using smf-grey, it's easy to exclude addresses or even entire domains from greylisting, I assume your greylisting method can do the same. btw, what kind of tool do you use to produce those graphs? Regards, K.
Duplicating a bayes database
Hello, I'm already using spamassassin with a shared bayes database for quite a while. As a result, this database is quite well trained for the spam that I receive and I'm very happy with the results. Now, I need to install another server (which will serve other domains), the setup is similar and I would like to install SpamAssassin as well ofcourse. My question, can I just copy the bayes database to the new server (so that it doesn't need training from the start) ? Or is this too tricky, are there any caveats I'm not seeing? Thank you! K.
Re: Custom Rule to catch this
On Thu, 8 Mar 2007, [EMAIL PROTECTED] wrote: I searched the list and found this rule to catch URL with single space (www.ledrx .com). Please help me in modifying this rule to catch URL with double space (www.superveils . com). body URL_WITH_SPACE m/\bhttp:\/\/[a-z0-9\-.]+[!*%, -]+\.?com\b/ Personally I would make it something like this: # Handles www. a.com, www.a .com, www. a .com, www . a.com, ... body __URL_WITH_SPACE1 /www[\ ]+?\.([a-z0-9\-]?\ [a-z0-9\-]?)+\.[ ]+?(com|net|org)/ # Handles www .xxx.com body __URL_WITH_SPACE2 /www[\ ]+\.([a-z0-9\-\ ]?)+\.[\ ]+?(com|net|org)/ # Handles www.xxx. com body __URL_WITH_SPACE3 /www[\ ]+?\.([a-z0-9\-\ ]?)+\.[\ ]+(com|net|org)/ meta URL_WITH_SPACE ( __URL_WITH_SPACE1 || __URL_WITH_SPACE2 || __URL_WITH_SPACE3 ) describe Body contains an URL with a space score URL_WITH_SPACE xx I did a few quick tests against some URL's, though it's untested against my ham spam boxes :-) K.
Re: Custom Rule to catch this
On Thu, 8 Mar 2007, Jeremy Fairbrass wrote: I just tested those three rules below, and none of them work with www.superveils . com (ie. having a space both before and after that dot). Strange, it matches rule 3 with egrep: echo 'www.superveils . com' | egrep 'www[\ ]+?\.([a-z0-9\-\ ]?)+\.[\ ]+(com|net|org)' www.superveils . com Ofcourse you can add other strange characters which obfuscate the URL like Nigel suggested (like , !, ...) K.
Annoying stocks scams
Hi List! I'm getting hit by a bunch of annoying stock scams which aren't found by any of my sare lists, they keep on scoring low. So I decided to write a custom rule, which seem to work pretty well for my case: body __HILO_STOCKS1 /(High|Low|Curr[e3]nt|Cur(r|\r.|r[e3]nt|\.)\ Price|Price)[\:\ \t]+\$[\d\ ]+?(.*)(Last|Low|Growth|High|Sale|Price)/i body __HILO_STOCKS2 /(hotlist|r[e3]cord|publicity|n[e3]ws|invest|incr[e3]as[e3]|[e3]xplosion|pric[e3]|high|pr[e3]mium|mark[e3]t|al[e3]rt|sym[b8]ol)/i meta HILO_STOCKS ( __HILO_STOCKS1 __HILO_STOCKS2 ) describe HILO_STOCKS Looks like stocks scam score HILO_STOCKS 3.5 It's my first meta rule, which only gives a score if both conditions are true, and I was wondering if there's a possibility to make the score more intelligent : - if __HILO_STOCKS1 fires up, i would like to give the score maybe 0.5 - if __HILO_STOCKS2 matches as well together with __HILO_STOCKS2, make it 3.5 Any other comments on this rule? Thanks!
Re: TextCat and Languages
On Fri, 2 Mar 2007, Matt Kettler wrote: You might be able to add a header rule that checks the X-Languages pseudo header. Great, this seems to work ! I learned something new, thanks a lot! :-) K.