Fuzzy OCR annoying Outlook users

2007-05-11 Thread kshatriyak

Hey,

I'm using FuzzyOCR which works great. However, lately I've been seeing 
annoying Outlook users using some kind of plugin which seem to add an 
image, and it has the text Free emoticons, download here (or something), 
mostly it's in my language and then it has the text gratis.


The word gratis gets mached by FuzzyOCR and the mail gets an extra score 
of 5.


So I tried adding the hash of this image:

# ./fuzzy-find --delete imstp_pets_cat1_du.gif
# ./fuzzy-find --learn-ham --score=0 imstp_pets_cat1_du.gif

However, when I scan the mail again, I'm still getting a score of 5:

   5.0 FUZZY_OCR_KNOWN_HASH   BODY: Mail contains an image with known hash
  Words found:
gratis in 1 lines
gratis in 1
  lines

Any idea's to learn FuzzyOCR not to tag this image as spam?

Thanks!
K.




Re: Increase of spam?

2007-05-04 Thread kshatriyak

On Thu, 3 May 2007, Jerry Durand wrote:

All DSL/dialup accounts get a 554 from us (using a couple of RBLs), so 
I've actually seen our spam decrease lately.


I've used RBLs too, in the past. However, i've noticed legitimate 
mailservers sometimes turn up in such lists so we were missing mails, 
and there were quite a lot of complaints. I tried to put in less 
restrictive RBLs, but in the end I had to remove them.


Now I'm thinking to enhance my greylisting to check RBLs, and if the IP 
is found in an RBL, to increase the greylisting time...


K.



Re: Justa a small nag from 3.2.0...

2007-05-04 Thread kshatriyak

On Fri, 4 May 2007, Matt Kettler wrote:


This apparently is fixed in perl 5.8.8, but still happens in 5.8.6,
5.8.5, etc.


Hm, I have a Slackware 11.0 box with perl 5.8.8 and I'm getting the same 
message. This problem also was there already with the previous version of 
spamassassin and FuzzyOcr, however, FuzzyOcr works fine, there are no 
warnings during execution, only when running spamassassin --lint


K.



Re: Increase of spam?

2007-05-04 Thread kshatriyak

On Fri, 4 May 2007, Andrzej Adam Filip wrote:

You can use gray-listing to avoid blind spot (detection delay) of such 
lists to increase their efficiency.


Yes, this is what I will try to archive in the future.


Two standard questions to clear the picture:
a) Do you block dynamic ip addresses at MTA level?
b) Do you block free email services?


No. I just use grey-listing on every host (except from IP's from my 
country).


K.



Re: KAM.cf ham

2007-05-03 Thread kshatriyak

On Wed, 2 May 2007, Henrik Krohns wrote:

I guess this doesn't hurt, but Bayes should already handle it. Most 
mails on my server are BAYES_00, since there is practically no spam in 
our language.


Well, I don't entirely agree. In theory bayes can handle things ofcourse, 
but I have words in my mail that would never occur in spam, mails with a 
phonenumber from my country almost never occurs in spam... I want to give 
such mail a more negative weight, to finetune things, and I can see it 
works great: even mails that are written in caps etc, but which aren't 
spam, aren't tagged falsely.


Also the opposite is true, but it is more obvious: most of my spam has 
BAYES_99, so in theory all spam can be handled by BAYES_99. But ofcourse, 
people write specific rules too to push the spam over the limit so it gets 
flagged. If Bayes would be fully perfect and if it would handle 
everything, SARE rules wouldn't be needed ;-)


K.



Increase of spam?

2007-05-03 Thread kshatriyak

Hi list,

Not sure if it's entirely on-topic, but at least I want to monitor it 
closely.


A while ago I implemented graylisting, which works quite well. But since 2 
days ago I'm seeing loads of mails which are passing by the greylisting 
(so they are being sent again by a real mailserver).


Anybody knows if there is a new windows virus on the loose that retries to 
deliver mails? The mails are coming from all kinds of hosts, all kinds of 
countries but mostly from dialup or adsl accounts (so, not 
hijacked corporate mailservers).


Thanks!
K.



whitelist_from_rcvd to train bayesdb ?

2007-04-27 Thread kshatriyak

Hi,

Although I have some negative-score rules, my ham mails never score too 
much below zero. I've set auto learning for ham to -12 to be sure spam 
never gets marked as ham and my bayes database doesn't get polluted- i 
think it's quite bad if ham mail would be autolearned as spam (i guess 
much more worse than the other way around).


Anyway, i've been thinking to use whitelist_from_rcvd to mark mail from 
certain providers (which i never saw spam from if it came from the 
right mailserver) with a low score so that my database also gets trained 
with more ham.


So for example:

whitelist_from_rcvd  [EMAIL PROTECTED]  isp-sending-domain

Is this a good idea, or am i abusing the whitelist_from_rcvd rule and am I 
missing something so this will it have a bad impact in the end?


Thanks!
K.



Re: KAUF-TIPP DER WOCHE spam getting through

2007-03-28 Thread kshatriyak

On Wed, 28 Mar 2007, Panagiotis Christias wrote:


the last days we get a lot of spam like this:

KAUF-TIPP DER WOCHE


I wrote a few of my own rules especially to catch those stocks scams 
together with bayes. If you don't have any people who should write you in 
German you can also use the X-Languages tag to boost the score if the mail 
is written in German.


Here are my current rules, which should also catch the German stocks. 
Maybe there are some false positives in a real stock environment, but for 
me they work fine:


body  __HILO_STOCKS1  /(High|Low|Curr[e3]nt|Cur(r|\r.|r[e3]nt|\.)\ 
P(ric[e3])?|Pric[e3]|Last)[\:\ \t]+\$[\d\ 
]+?(.*)(Last|Low|Growth|Grow||High|Sale|Pric[e3]|Vol|[E3]xp)[\:\ \t]+/i

body  __HILO_STOCKS2  /curr[e3]n[t7](ly)?[\ \t\_]+?\:[\ \t\_\$]+?\d/i
body  __HILO_STOCKS2  /[e3](x|ks)p[e3]ct[e3]d?[\ \t\_]+?\:[\ 
\t\_\$]+?\d/i
body  __HILO_STOCKS3  /our[\ \t\_]+?(last[\ ]+?)?pick[\:\ 
\t\_\;\=\,]/i
body  __HILO_STOCKS4  /\d[\ 
\t\_]+?(c[e3]nt|dollar|[e3]ur|p[e3]nc[e3])/i
body  __HILO_STOCKS5  /(c[e3]nt|dollar|[e3]ur[o]?|p[e3]nc[e3])[\ 
\t\_]+?\d/ibody  __HILO_STOCKS9  /(hot[\ 
\t\_]+?list|r[e3]cord|publicity\ |n[e3]ws\ 
|invest|incr[e3]as[e3]|[e3]xplosion|high\ 
|pr[e3]mium|mark[e3]t|al[e3]rt|sym[b8]ol|the\ rush|your\ radar|g[e3]t\ 
[i1]n|schluss\-?stand|prognose|kauf\-?tip)/i


meta  HILO_STOCKS ( ( __HILO_STOCKS1 || __HILO_STOCKS2 || 
__HILO_STOCKS3 || __HILO_STOCKS4 || __HILO_STOCKS5 )  __HILO_STOCKS9 )

describe  HILO_STOCKS Looks like stocks scam
score HILO_STOCKS 3.0




Re: bayes effectiveness dropping with use of greylisting?

2007-03-20 Thread kshatriyak

On Tue, 20 Mar 2007, Erik Slooff wrote:

I have an interesting observation on my mail gateway (policyd for 
greylisting, postfix, amavisd-new and spamassassin); after implementing 
greylisting and other measures such as RBLs there aren't enough spam 
messages coming through to keep bayes trained.


Hey,

I did not have this problem (yet), I've just implemented greylisting maybe 
3 weeks ago. Though it's an interesting problem, I assume I can have the 
same once.


The solution seems simple to me, I guess. Just exclude a few dummy 
addresses (take common names) from your greylisting rules, those addresses 
will catch all the new spam and will train your database.


I'm using smf-grey, it's easy to exclude addresses or even entire domains 
from greylisting, I assume your greylisting method can do the same.


btw, what kind of tool do you use to produce those graphs?

Regards,
K.



Duplicating a bayes database

2007-03-09 Thread kshatriyak

Hello,

I'm already using spamassassin with a shared bayes database for quite a 
while. As a result, this database is quite well trained for the spam that 
I receive and I'm very happy with the results.


Now, I need to install another server (which will serve other domains), 
the setup is similar and I would like to install SpamAssassin as well 
ofcourse.


My question, can I just copy the bayes database to the new server (so 
that it doesn't need training from the start) ? Or is this too tricky, are 
there any caveats I'm not seeing?


Thank you!
K.



Re: Custom Rule to catch this

2007-03-08 Thread kshatriyak

On Thu, 8 Mar 2007, [EMAIL PROTECTED] wrote:

I searched the list and found this rule to catch URL with single space 
(www.ledrx .com). Please help me in modifying this rule to catch URL 
with double space (www.superveils . com).


body URL_WITH_SPACE m/\bhttp:\/\/[a-z0-9\-.]+[!*%, -]+\.?com\b/


Personally I would make it something like this:

# Handles www.  a.com, www.a .com, www. a .com, www . a.com, ...
body __URL_WITH_SPACE1 /www[\ ]+?\.([a-z0-9\-]?\ [a-z0-9\-]?)+\.[ 
]+?(com|net|org)/

# Handles www .xxx.com
body __URL_WITH_SPACE2 /www[\ ]+\.([a-z0-9\-\ ]?)+\.[\ ]+?(com|net|org)/
# Handles www.xxx. com
body __URL_WITH_SPACE3 /www[\ ]+?\.([a-z0-9\-\ ]?)+\.[\ ]+(com|net|org)/

meta URL_WITH_SPACE ( __URL_WITH_SPACE1 || __URL_WITH_SPACE2 || 
__URL_WITH_SPACE3 )

describe Body contains an URL with a space
score URL_WITH_SPACE xx

I did a few quick tests against some URL's, though it's untested against 
my ham  spam boxes :-)


K.



Re: Custom Rule to catch this

2007-03-08 Thread kshatriyak

On Thu, 8 Mar 2007, Jeremy Fairbrass wrote:

I just tested those three rules below, and none of them work with 
www.superveils . com (ie. having a space both before and after that 
dot).


Strange, it matches rule 3 with egrep:

echo 'www.superveils . com' | egrep 'www[\ ]+?\.([a-z0-9\-\ ]?)+\.[\ 
]+(com|net|org)'

www.superveils . com

Ofcourse you can add other strange characters which obfuscate the URL 
like Nigel suggested (like , !, ...)


K.



Annoying stocks scams

2007-03-06 Thread kshatriyak

Hi List!

I'm getting hit by a bunch of annoying stock scams which aren't found by 
any of my sare lists, they keep on scoring low.


So I decided to write a custom rule, which seem to work pretty well for 
my case:


body  __HILO_STOCKS1  /(High|Low|Curr[e3]nt|Cur(r|\r.|r[e3]nt|\.)\ 
Price|Price)[\:\ \t]+\$[\d\ ]+?(.*)(Last|Low|Growth|High|Sale|Price)/i
body  __HILO_STOCKS2 
/(hotlist|r[e3]cord|publicity|n[e3]ws|invest|incr[e3]as[e3]|[e3]xplosion|pric[e3]|high|pr[e3]mium|mark[e3]t|al[e3]rt|sym[b8]ol)/i


meta  HILO_STOCKS ( __HILO_STOCKS1  __HILO_STOCKS2 )
describe  HILO_STOCKS Looks like stocks scam
score HILO_STOCKS 3.5

It's my first meta rule, which only gives a score if both conditions are 
true, and I was wondering if there's a possibility to make the score more 
intelligent :


- if __HILO_STOCKS1 fires up, i would like to give the score maybe 0.5
- if __HILO_STOCKS2 matches as well together with __HILO_STOCKS2, make it 
3.5


Any other comments on this rule?

Thanks!



Re: TextCat and Languages

2007-03-02 Thread kshatriyak

On Fri, 2 Mar 2007, Matt Kettler wrote:


You might be able to add a header rule that checks the  X-Languages
pseudo header.


Great, this seems to work ! I learned something new, thanks a lot! :-)

K.