Re: Stock spam in images

2006-10-04 Thread Jason Haar
I'm having marvelous luck with FuzzyOCR - but the spammers are learning too.

When I first started using it just a couple of months ago, it really
whacked the image-based spam. You could see why when gocr file.gif
returned nice text that was easy to match against.

However, now is a different matter. I just got a lose weight spam 10
minutes ago that gocr returns as:

  lI__c_tc)r _rc_hc_rihc_Ll _cnLl .h1c_Llic_;cll_ _u__c_c __ihc LI
  l c htc)hlc_rc)c_c_ B llr_ll l hc r_cp_


_ t4 __cc_'un ic) __'ri_c _ hH3s, t_k   _ ,r o_E,y _h K E,_
_ ,_ics r _ sncu)._r. t.ihk). lhirkrr x_))  '   gg __, r
_ Krvc)_H t)r r_irk cct .__ _
 O _' Y O ___ TE_ E
 _Lncl nLnn __ mc)R hnrtb

That tells me to go to www.realhgh dot org , but their GIF processing
munged it enough to slip by gocr

Not much FuzzyOCR can do with that :-(

-- 
Cheers

Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +64 3 9635 377 Fax: +64 3 9635 417
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1



Re: Score=x ?

2006-10-04 Thread Benny Pedersen

On Wed, October 4, 2006 05:59, M.Lewis wrote:
 X-Spam-Status: No, score=x tagged_above=-999 required=5 tests=[]

check amavisd.conf for $sa_mail_body_size_limit is highter then the mail size 
in this
mail, if it is amavisd disabled scanning of this mail

-- 
This message was sent using 100% recycled spam mails.



Re: Score=x ?

2006-10-04 Thread Benny Pedersen

On Wed, October 4, 2006 09:18, Benny Pedersen wrote:
 On Wed, October 4, 2006 05:59, M.Lewis wrote:
 X-Spam-Status: No, score=x tagged_above=-999 required=5 tests=[]
 check amavisd.conf for $sa_mail_body_size_limit is highter then the mail size 
 in this
 mail, if it is amavisd disabled scanning of this mail

sorry its lower, not highter size, but this is the only time i have seen 
score=x with
amavisd, you can raise the size limit so you scan them aswell, just don't set 
the limit to
high, but still high enough to not let spam through

-- 
This message was sent using 100% recycled spam mails.



Re: Score=x ?

2006-10-04 Thread M.Lewis

Benny Pedersen wrote:

On Wed, October 4, 2006 05:59, M.Lewis wrote:

X-Spam-Status: No, score=x tagged_above=-999 required=5 tests=[]


check amavisd.conf for $sa_mail_body_size_limit is highter then the mail size 
in this
mail, if it is amavisd disabled scanning of this mail



Thanks Benny. I seriously doubt that was it as the message in question 
was 268KB. However I will check it out.


Thank you very much!
Mike

--

 Software engineer: One who engineers others into writing the code for 
him/her.

  02:35:01 up  2:18,  7 users,  load average: 0.46, 0.31, 0.24

 Linux Registered User #241685  http://counter.li.org


Re: perl hogging my memory?

2006-10-04 Thread Justin Mason

hey, feel free to edit around that FAQ too, Matt ;)

Right now I think that question really *is* the most FA'd Q.

--j.

Matt Kettler writes:
 Woot!! Thank you Justin and the rest of the Wiki crew for putting that up!
 
 I was getting tired of writing the Are you using sa-blacklist.cf?
 email over, and over again.
 
 Justin Mason wrote:
  have you looked at
  http://wiki.apache.org/spamassassin/OutOfMemoryProblems ?
  note especially the 'Heavyweight custom rules' section.
 
  --j.
 
  Evan Platt writes:

  Ok, I've googled and obviously I'm not finding the right solution.. 
  But had to reinstall spamassassin on my os/x 10.4 box.
 
  Followed http://developer.apple.com/server/fighting_spam.html .
 
  But, my system is running out of memory, and it looks like Perl / 
  spamassassin is the cause . I've omitted everything but the Perl and 
  Spamassassin related entries:
 
  Load Avg:  1.97, 1.36, 0.78 CPU usage:  84.4% user, 15.6% sys, 0.0% 
  idle
  SharedLibs: num =  106, resident = 3.54M code,  364K data,  780K LinkEdit
  MemRegions: num =  4984, resident =  217M + 1.37M private,  236M shared
  PhysMem:  44.7M wired,  307M active,  153M inactive,  506M used, 5.54M free
  VM: 4.00G + 79.0M   50554(137) pageins, 65232(79) pageouts
 
 PID COMMAND  %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  
  VSIZE
 448 spamc0.0%  0:00.00   11518   128K   268K-  396K  
  27.7M
 447 procmail 0.0%  0:00.00   1 816 8K-  364K-  176K  
  26.7M
 445 procmail 0.0%  0:00.02   11516 8K-  364K-  412K  
  26.7M
 416 perl35.1%  0:10.60   110   391  30.4M   233M- 94.2M   
  391M
 394 spamc0.0%  0:00.00   1151888K   268K-  356K  
  27.7M
 393 procmail 0.0%  0:00.00   1 816 8K   316K-  172K  
  26.7M
 391 procmail 0.0%  0:00.02   11516 8K   316K-  364K  
  26.7M
 378 perl10.1%  0:48.50   110   388   150M+  207M-  217M+  
  391M
 377 perl44.7%  1:18.63   110   388  26.3M   233M- 72.8M   
  391M
 271 perl 0.0%  0:00.12   11043  1.93M   284K  1.07M  
  29.1M
  65 perl 0.0%  3:41.24   115   387  1.43M-  233M- 56.9M-  
  391M
 
 
  So what did I do wrong that's causing a Perl process to take up 391 megs?
 
  Obviously, I'm only guessing it's spamassassin related, but that's 
  the only thing I can think of using perl. And I see a few google 
  reference to spamassassin and perl.
 
 
  Any other information I can provide, please let me know.
 
 
  Thanks.
 
  Evan
  
 
 


Re: HELO test rule-writing questions

2006-10-04 Thread Justin Mason

Clifton Royston writes:
   I'm trying to write some SA rules for additional tests on the
 connecting mailserver's SMTP HELO string, and I have some questions
 about how to do it.  Should I send them to this list or to the
 dev list?

hey Clifton! -- yep, this list.

   Assuming it's this list, one of the things I'm trying to do is assign
 a modest score to helo strings containing a bracketed IP address. 
 (This is technically valid in SMTP.)
 
   I've read through some of the tests in 20_fake_helo_tests.cf, and it
 appears they rely on SA's parsing code creating a kind of magic
 pseudo-header X-Spam-Relays-Untrusted containing a string with the
 helo and other data?
 
   I'm not sure I get the point of the recurring [^\]]+ bits in the
 examples I looked at.

So, the deal is that 'X-Spam-Relays-Untrusted' will contain *all*
untrusted relays, one after the other.  /^[^\]]+ / ensures that
only the helo string from the *most recent* untrusted relay --
the handover into the trusted networks -- is checked.

This is required because it's perfectly fine for a user's MUA
to use this kind of helo string; the spammy case is when an
MTA which is supposedly run by an ISP is handing it over to
the recipient's MX, and that one should not use that style
of helo.

See http://wiki.apache.org/spamassassin/TrustedRelays for more info.

   So would a test for a bracketed IP address look like this?
 
 # [60.222.35.88]
 header HELO_BRACKETED_IP  X-Spam-Relays-Untrusted =~ /^[^\]]+ 
 helo=\[\d+\.\d+\.\d+\.\d+\][^\]]+ auth= /i

   I want to distinguish this case from a bare IP address (invalid!)
 which I also want to look at and score:
 
 # [60.222.35.88]
 header HELO_BARE_IP  X-Spam-Relays-Untrusted =~ /^[^\]]+ 
 helo=\d+\.\d+\.\d+\.\d+[^\]]+ auth= /i

both look good.  be sure to let us know if you find something useful ;)

--j.


RE: Problem with URIBL rules : false positive and not listed while mannually checking

2006-10-04 Thread Fabien GARZIANO

 What version of SpamAssassin are you running?  Versions before
 3.1 have an infrequent DNS query bug:
 
   http://bugzilla.spamassassin.org/show_bug.cgi?id=3997
 

I'm running SpamAssassin version 3.0.5. (On Perl 5.8.6).
I've checked the the bugzilla page about this bug. I dont understand a
damn thing 8-|... I guess that i need to update my spamassassin setup
and I'm scared. I'm gonna check the wiki for advice on spamassassin
updates, but first, get a horse shoe, and recite a hundred mantras !

 Another possibility is that there is a DNS proxy or DNS 
 modification service like OpenDNS changing the DNS results in 
 a way that's not compatible with SURBL applications:
 
   http://www.surbl.org/faq.html#opendns

I dont run any dns service on this box ... It's a clean MailScanner VM
and I dont see no process named 'dns' with ps ax

 In any case, none of the domains mentioned are blacklisted, 
 so there is a problem with your SpamAssassin or DNS.

About the checks, did you use
http://www.rulesemporium.com/cgi-bin/uribl.cgi ?
Do you know a way to see result for each test (PH, OB, etc ... ) ?

Thank you for this anwser Jeff


Re: Spamassassin Rules

2006-10-04 Thread Jeff Chan
On Tuesday, October 3, 2006, 6:57:21 PM, Loren Wilton wrote:
 If you don't have network rules enabled you should enable them.  The
 URIBL-type rules will probably catch the vast majority of this junk.  Most 
 of the mis-spelled pharma stuff I get scores around 50.

See:

  http://www.surbl.org/

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Problem with URIBL rules : false positive and not listed while mannually checking

2006-10-04 Thread Loren Wilton

I'm running SpamAssassin version 3.0.5. (On Perl 5.8.6).
I've checked the the bugzilla page about this bug. I dont understand a
damn thing 8-|... I guess that i need to update my spamassassin setup
and I'm scared. I'm gonna check the wiki for advice on spamassassin
updates, but first, get a horse shoe, and recite a hundred mantras !


Updating from 3.0.5 to the current version isn't particularly painful; 
certainly not as hard as 2.6 to 3.x was.
The amin thing to look out for is things that have moved to plugins, and you 
will have to enable the plugins to keep your current functionality in some 
cases.  Just look for the *.pre files and uncomment anything that seems 
appropriate.




Re: Problem with URIBL rules : false positive and not listed while mannually checking

2006-10-04 Thread Justin Mason

Loren Wilton writes:
  I'm running SpamAssassin version 3.0.5. (On Perl 5.8.6).
  I've checked the the bugzilla page about this bug. I dont understand a
  damn thing 8-|... I guess that i need to update my spamassassin setup
  and I'm scared. I'm gonna check the wiki for advice on spamassassin
  updates, but first, get a horse shoe, and recite a hundred mantras !
 
 Updating from 3.0.5 to the current version isn't particularly painful; 
 certainly not as hard as 2.6 to 3.x was.
 The amin thing to look out for is things that have moved to plugins, and you 
 will have to enable the plugins to keep your current functionality in some 
 cases.  Just look for the *.pre files and uncomment anything that seems 
 appropriate.

and read the UPGRADE file -- these things are all called out there.

--j.


Re: FuzzyOCR seems to not like gif and png

2006-10-04 Thread decoder
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Loren Wilton wrote:
 @page Section1 {size: 8.5in 11.0in; margin: 1.0in 1.0in 1.0in 1.0in;
 } P.MsoNormal { FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY:
 Times New Roman } LI.MsoNormal { FONT-SIZE: 12pt; MARGIN: 0in 0in
 0pt; FONT-FAMILY: Times New Roman } DIV.MsoNormal { FONT-SIZE:
 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: Times New Roman } A:link {
 COLOR: blue; TEXT-DECORATION: underline } SPAN.MsoHyperlink { COLOR:
 blue; TEXT-DECORATION: underline } A:visited { COLOR: purple;
 TEXT-DECORATION: underline } SPAN.MsoHyperlinkFollowed { COLOR:
 purple; TEXT-DECORATION: underline } SPAN.EmailStyle17 { COLOR:
 windowtext; FONT-FAMILY: Arial; mso-style-type: personal-compose }
 DIV.Section1 { page: Section1 }
 There are newer versions of FuzzyOCR that probably fix or at least
 get around this.  A lot of image spam mails have broken images in
 them, and this messes up a lot of stuff.  The latest versions use
 ImageMagic.  This is reputedly hard to install on many systems.  But
 if you can get it installed it seems to work much better in terms of
 the images that it can handle.
 
 You might want to join the FuzzyOCR mailing list:
 
 List-Id: devel-spam.lists.own-hero.net
 List-Unsubscribe:
 http://lists.own-hero.net/mailman/listinfo/devel-spam,
  mailto:[EMAIL PROTECTED]
 List-Archive: http://lists.own-hero.net/mailman/private/devel-spam
 List-Post: mailto:[EMAIL PROTECTED]
 List-Help: mailto:[EMAIL PROTECTED]
 List-Subscribe: http://lists.own-hero.net/mailman/listinfo/devel-spam,
  mailto:[EMAIL PROTECTED]
 If you search the list archive you will see a number of posts on the
 current release and where to get it.  I think the current version is
 something like J.
The current version is b. J is a devel version as are all versions
higher than b. Please note that when trying out these versions. A new
stable version will follow soon, once I get the time again.

Chris
 
 Loren
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFI61qJQIKXnJyDxURAkZTAJwN39dvgOtmYg4gp63OAivuBx8cYQCgjH7c
f3p/ug6HPt+YEjoly1iETPA=
=wgR7
-END PGP SIGNATURE-



RE: Spamassassin Rules

2006-10-04 Thread Chris Santerre
Title: RE: Spamassassin Rules







 -Original Message-
 From: Jeff Chan [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, October 04, 2006 6:13 AM
 To: Loren Wilton
 Cc: users@spamassassin.apache.org
 Subject: Re: Spamassassin Rules
 
 
 On Tuesday, October 3, 2006, 6:57:21 PM, Loren Wilton wrote:
  If you don't have network rules enabled you should enable them. The
  URIBL-type rules will probably catch the vast majority of 
 this junk. Most 
  of the mis-spelled pharma stuff I get scores around 50.
 
 See:
 
 http://www.surbl.org/
 


Taste: 


 http://www.uribl.com/



Sorry Jeff I couldn't resist. I'm in a weird mood today. ;) 
Much love for the SURBL team! 


Go Patriots!


Thanks,


Chris Santerre
SysAdmin and Spamfighter
www.rulesemporium.com
www.uribl.com







Failing install - CPAN NET::DNS

2006-10-04 Thread Thomas Ericsson

Hi all

Tried to install NET::DNS via CPAN to get my network tests going but  
get the following error report at the end:


sudo cpan -i Net::DNS

snip

Running make test
PERL_DL_NONLAZY=1 /usr/local/bin/perl -MExtUtils::Command::MM -e  
test_harness(0, 'blib/lib', 'blib/arch') t/*.t

t/00-load..ok
t/00-pod...skipped
all skipped: Test::Pod v0.95 required for testing POD
t/00-version...ok
t/01-resolver-env..ok
t/01-resolver-file.ok
7/8 skipped: Could not read configuration file
t/01-resolver-opt..ok
t/01-resolver..ok
t/02-headerok
t/03-question..ok
t/04-packet-unique-pushok
t/04-packetok
t/05-rr-optok
t/05-rr-rrsort.ok
t/05-rr-sshfp..skipped
all skipped: Digest::BubbleBabble not installed.
t/05-rr-txtok
t/05-rr-unknownok
t/05-rrok
t/06-updateok
t/07-misc..ok
t/08-onlineok 73/93
#   Failed test 'Socket is ready'
#   in t/08-online.t at line 176.
t/08-onlineok 93/93# Looks like you failed 1 test of 93.
t/08-onlinedubious
Test returned status 1 (wstat 256, 0x100)
DIED. FAILED test 74
Failed 1/93 tests, 98.92% okay (less 3 skipped tests: 89  
okay, 95.70%)

t/09-tkey..ok
t/10-recurse...ok
t/11-escapedchars..# Using the  XS compiled dn_expand function
t/11-escapedchars..ok 96/141#
# disabling XS based dns_expand for a moment.
t/11-escapedchars..ok 99/141#
# Continuing to use the XS based dn_expand()
t/11-escapedchars..ok
t/11-inet6.ok
10/11 skipped: Socket6 and or IO::Socket::INET6 not loaded
Failed Test   Stat Wstat Total Fail  Failed  List of Failed
 
---

t/08-online.t1   256931   1.08%  74
2 tests and 20 subtests skipped.
Failed 1/24 test scripts, 95.83% okay. 1/1057 subtests failed, 99.91%  
okay.

make: *** [test_dynamic] Error 2
  /usr/bin/make test -- NOT OK
Running make install
  make test had returned bad status, won't install without force

Any tips on how to tweak to pass the test or do you think it is safe  
to use a bit of force? Could it be the missing IO::Socket::INET6?


Setup: OSX 10.3.9, Communigate 4.2.8, CGPSA 1.4, SA 3.1.3


Thomas Ericsson
_
Fido Film AB
StadsgÄrden 17
SE-116 45 Stockholm
T: int+46 (0)8 556 990 06
F: int+46 (0)8 556 990 01
http://www.fido.se
_






RE: Spamassassin Rules

2006-10-04 Thread Sietse van Zanen
Title: RE: Spamassassin Rules


Yes, spamassassin definitely RULES! ;-D


Re: perl hogging my memory?

2006-10-04 Thread Matt Kettler
I definitely have to say that between OutOfMemoryProblems and TrustPath
we've probably covered about 20% of the problems on the list :)

Justin Mason wrote:
 hey, feel free to edit around that FAQ too, Matt ;)

 Right now I think that question really *is* the most FA'd Q.

 --j.

   



R: perl hogging my memory?

2006-10-04 Thread Giampaolo Tomassoni
I dream of a amavis+spamassassin system developed in C++ language and with 
separate rule compiler and matching daemon...

Also, this thing of running a perl regex matching for each (enabled) rule is a 
bit brain damaged... Why not invert the flow and build something like flex + 
bison, ie: a grammar parser: you feed it with your text, and it replies with 
the hitten rules.

Well, just an idea.

PS: brain damage is just an eufemism: it actually works!

giampaolo


 I definitely have to say that between OutOfMemoryProblems and TrustPath
 we've probably covered about 20% of the problems on the list :)
 
 Justin Mason wrote:
  hey, feel free to edit around that FAQ too, Matt ;)
 
  Right now I think that question really *is* the most FA'd Q.
 
  --j.
 

 



Re: Problem with URIBL rules : false positive and not listed while mannually checking

2006-10-04 Thread Jeff Chan
On Wednesday, October 4, 2006, 3:11:16 AM, Fabien GARZIANO wrote:

 Another possibility is that there is a DNS proxy or DNS
 modification service like OpenDNS changing the DNS results in 
 a way that's not compatible with SURBL applications:
 
   http://www.surbl.org/faq.html#opendns

 I dont run any dns service on this box ... It's a clean MailScanner VM
 and I dont see no process named 'dns' with ps ax

There's usually some DNS service on the box or on your local or
ISP network.  If you're on a Unix/Linux/BSD box it's usually
called 'named'.  As long as DNS isn't doing anything unusual,
then it's a non-issue.  Just use normal, default DNS service if
your message volume is less than 100k to 250k per day.

 In any case, none of the domains mentioned are blacklisted, 
 so there is a problem with your SpamAssassin or DNS.

 About the checks, did you use
 http://www.rulesemporium.com/cgi-bin/uribl.cgi ?

I did a local DNS query:

  dig somedomain.com.multi.surbl.org a

If you get NXDOMAIN then it's not listed.

 Do you know a way to see result for each test (PH, OB, etc ... ) ?

  dig somedomain.com.multi.surbl.org txt

will show the lists; so will the lookup page, and so will:

  spamassassin -D  some_message_in_a_file

Cheers,

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



RE: Problem with URIBL rules : false positive and not listed while mannually checking

2006-10-04 Thread Fabien GARZIANO
 I did a local DNS query:
 
   dig somedomain.com.multi.surbl.org a
 
 If you get NXDOMAIN then it's not listed.
 
  Do you know a way to see result for each test (PH, OB, etc ... ) ?
 
   dig somedomain.com.multi.surbl.org txt
 
 will show the lists; so will the lookup page, and so will:
 
   spamassassin -D  some_message_in_a_file

Thanks a lot for the tip with dig. That's what I was looking for. 

 There's usually some DNS service on the box or on your local 
 or ISP network.  If you're on a Unix/Linux/BSD box it's 
 usually called 'named'.  As long as DNS isn't doing anything 
 unusual, then it's a non-issue.  Just use normal, default DNS 
 service if your message volume is less than 100k to 250k per day.

And for dns, I'm sorry, I typed it too fast and when I meant no 'dns' i
also meant no 'named' process. 
On this box, i've tried
 :# dig nortel.com.multi.surbl.org a
And it returned me NXDOMAIN as you said, so I guess it may not be a dns
problem on this box.
(the DNS serveur answering is my ISP's).
I think i'm gonna update Spamassassin anyway, it should be a good reason
to do it.

Thanks for all this goods anwsers ! 
P.S : sorry Jeff if you receive this Email twice


FuzzyOCR gocr spins wheels

2006-10-04 Thread Shue, Daniel G.
Hi people,
Got a problem with FuzzyOCR.  I'm using version 2.3j but had
the same problem with 2.1.  At points through out the day there will be
3 - 4 instances of gocr running all fighting over CPU.  They run
forever, which I though that SA would kill the processes if they took to
long which is fine with me.  Any ways this of course slows the box and
mail will back up.  I know when I was at 2.1 I went with the GIFLIB ver
.41 instead of the .40.  From there I couldn't apply that patch that the
plug-in comes with.  I thought that maybe this would be the culprit for
sure if there was one but I thought I'd check here first before I got
into the hassle of removing some libraries.

TIA

Daniel


This email and any files transmitted with it are confidential and intended for 
use only by the individual or entity named above.  If you are not the intended 
recipient or the employee or agent responsible for delivering this message to 
the intended recipient, you are hereby notified that any disclosure, 
dissemination, distribution, copying of this communication, or unauthorized use 
is strictly prohibited.  Please notify us immediately by reply email and then 
delete this message from your system.   Please note that any views or opinions 
presented in this email are solely those of the author and do not necessarily 
represent those of Randolph County Government.  This email and any file 
attachments have been scanned for potential viruses; however, the recipient 
should check this email for the presence of viruses and/or malicious code.  
Randolph County accepts no liability for any damage transmitted via this email.


ImageInfo Bug

2006-10-04 Thread Stuart Johnston

Dallas,

I think there is a bug in the image_size_range function.

my $name = $type.'_dems';

Should probably be more like:

my $name = dems_$type;

Thanks,
Stuart


RE: Stock spam in images

2006-10-04 Thread Chris Santerre
Title: RE: Stock spam in images





Greetings list, 


 The old timers on the list know I tend to try things outside the norm. Like my strong resistence to sitewide bayes. Well for months I've been using a simpler approach to these Stock Spams w/ images. I don't look at the image at all. Heresy I know, but thats the way I roll :) 

 This goes back to my old philosophy of: One rule hit (either FP, FN, or legit) should not make a messege an FP, FN, or legit on its own. 

 With that in mind, I wrote a series of 3-4 simple rules, scored them low, and watched the results. These are unpublished rules, and I'm not sure they are ready to be published just yet. But this is about the idea of what I'm doing. 

 Simple example: Is there even an inline image attached? (note: I'm talking about a src="" here, not an attached image to the email!) Well if there is, why not add low points? Which is what I do. I actually score this at a crazy 1.5! Before you scream to the heavens that I'm nuts, let me continue.

 EVERYONE of these Stock image spams has hit mutiple rules. SARE rules, standard rules , and my 3-4 rules I wrote from finding the simple patterns in these spams. This is the key. Combined rule hits mark it as spam. I've yet to see a single FP caused by ONE of these rules. Sure, if a legit mail comes thru with a src="" it will hit the rule. But I've never seen one that hit the other rules and passed it over the marking threshold. This is not a knew idea by any means, but one that seems to be lost under new fangled fuzzyOCR. 

 I think FuzzyOCR is wonderful. Imageinfo is great! But IMHO, wasting too many CPU cycles and energy. Spammers already trying animated gifs, and noise. I wanted to quietly give this method a try and it seems to be working beautifully. 

 I say my rules aren't ready for publishing because for the public I'd like the rules to be tighter. Prbly used as metas to reduce FPs in general world usage. Anyway, I just wanted to say that sometimes the simple ways still work great!

(Any spelling errors in this post are your fault!)


Thanks,


Chris Santerre
SysAdmin and Spamfighter
www.rulesemporium.com
www.uribl.com






Re: What's the best method to use SA?

2006-10-04 Thread Clifton Royston
On Wed, Oct 04, 2006 at 04:43:37AM +, Monty Ree wrote:
 Hello.
 
 I have used SA using with procmail.
 and clamav + sendmail(libmilter) against virus.
 
 But I have found that other related solutions like 
 http://www.mailscanner.info/ or 
 http://www.amavis.org/. 
 
 I don't know what's the difference or better between SA using procmail or 
 above solutions.
 more fast or more effective??

  The original amavis is not well supported AFAIK.  However, it has a
descendant amavisd-new which is actively developed and supported.
  http://www.ijs.si/software/amavisd/

  I don't know mailscanner, but amavisd-new is a much more efficient
approach for a mailserver, especially at ISP level.  The two main
differences are 1) Spamassassin (and clamav) get run once per incoming
mail, not once for every recipient, and 2) amavisd-new runs as a
daemon, so Spamassassin only has to be compiled in Perl once instead of
once per incoming message.

 Anyone who uses above solutions?
 
  *Lots* of mailservers use amavisd-new, including many ISPs and 3rd
party mail providers, FWIW.

  -- Clifton

-- 
Clifton Royston  --  [EMAIL PROTECTED] / [EMAIL PROTECTED]
   President  - I and I Computing * http://www.iandicomputing.com/
 Custom programming, network design, systems and network consulting services


RE: What's the best method to use SA?

2006-10-04 Thread Chris Santerre
Title: RE: What's the best method to use SA?







 -Original Message-
 From: Monty Ree [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, October 04, 2006 12:44 AM
 To: users@spamassassin.apache.org
 Subject: What's the best method to use SA?
 
 
 Hello.
 
 I have used SA using with procmail.
 and clamav + sendmail(libmilter) against virus.
 
 But I have found that other related solutions like 
 http://www.mailscanner.info/ or 
 http://www.amavis.org/. 
 
 I don't know what's the difference or better between SA using 
 procmail or 
 above solutions.
 more fast or more effective??
 
 Anyone who uses above solutions?


I use sendmail and procmail. I think that combo is very good. Lets you do some neat things. 


--Chris





Re: Stock spam in images

2006-10-04 Thread Jorge Valdes

Jason Haar wrote:

I'm having marvelous luck with FuzzyOCR - but the spammers are learning too.

When I first started using it just a couple of months ago, it really
whacked the image-based spam. You could see why when gocr file.gif
returned nice text that was easy to match against.

However, now is a different matter. I just got a lose weight spam 10
minutes ago that gocr returns as:

  lI__c_tc)r _rc_hc_rihc_Ll _cnLl .h1c_Llic_;cll_ _u__c_c __ihc LI
  l c htc)hlc_rc)c_c_ B llr_ll l hc r_cp_


_ t4 __cc_'un ic) __'ri_c _ hH3s, t_k   _ ,r o_E,y _h K E,_
_ ,_ics r _ sncu)._r. t.ihk). lhirkrr x_))  '   gg __, r
_ Krvc)_H t)r r_irk cct .__ _
 O _' Y O ___ TE_ E
 _Lncl nLnn __ mc)R hnrtb

That tells me to go to www.realhgh dot org , but their GIF processing
munged it enough to slip by gocr

Not much FuzzyOCR can do with that :-(

  
A few days ago, someone provided me with an image that returned garbage 
when using plain 'gocr file'.  The trick to better detection is to 
adjust gocr's -l parameter to get better contrast (and better results).  
By looping 0...255 you will find a setting which will give you good 
results for this type of image, and if you start getting a lot of these 
images, adding another scanset will not add too many cpu cycles to your 
scan.  This new setting will almost certainly give you better results 
with other images too, so unless you have a really overloaded system, 
adding another scanset will not 'break the bank'.


--
Jorge Valdes




bayes_toks.expire.... can I delete these?

2006-10-04 Thread Derek Catanzaro
I have a ton of bayes_toks.expire files listed in /root/.spamassassin.  
Is it safe to delete these files?  I did check the FAQ regarding 
manybayestoksexpirefiles but from what I can tell the directory is not 
set to use sticky bit.  Here is my ls -la results on the directory:


[EMAIL PROTECTED] .spamassassin]# ls -la
total 12803624
drwx--2 root root   163840 Oct  4 11:43 .
drwxr-x---6 root root 4096 Oct  4 11:03 ..
-rw---1 root root12288 May 16  2005 auto-whitelist
---snip---

I have also been experiencing SpamAssassin timed out errors in my 
maillog over the past couple of days, would the bayes_toks.expire files 
have anything to do with this?  If not, I will review the the FAQ's and 
post a new topic if I need assistance. 


Running Fedora Core 1
spamassassin 3.1.0
MailScanner 4.49.7
Perl 5.8.1
MTA - sendmail 8.13.5

Thanks,
Derek

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



Re: What's the best method to use SA?

2006-10-04 Thread qqqq
Title: RE: What's the best method to use SA?



Sendmail/Procmail

/etc/procmailrc:
:0fw*  115000* ! 
^(TO|Cc):.(user1noscan|user2noscan|user3noscan)* ! ^Return-Path: 
\\* ! ^List-Id:.\MUNGED.yahoogroups.com\* ! 
^Disposition-Notification-To:.*MUNGED* ! 
^Received:.(domain1.com|domain2.com|domain3.com)* ! ^To:.*abuse* ! 
^Message-Id:.*(MessageID1|MessageID2|MessageID3)| /usr/bin/spamc -d 
192.168.0.200 -p 789 -t 60







Re: What's the best method to use SA?

2006-10-04 Thread Andreas Pettersson
I use Exim with the integrated SA ACL.
I'm really pleased with how it works.

http://www.exim.org/exim-html-4.62/doc/html/spec_html/ch40.html


/Andreas



switching from global bayes to per-user bayes

2006-10-04 Thread Adam Lanier
I am looking into switching from a global bayes/awl/setting environment
to a per-user environment with MySQL as a backend.

puts on asbestos suit
Would anyone care to offer an opinion as to whether and/or to what
degree this might make in overall effectiveness?  Anyone back up that
opinion with cold hard facts?

Will I be able to migrate small sets of users from global to per-user or
will I have to make the jump for all my end-users/domains at once?

I'd like to preload the bayes db for each user so that's it's 'primed'
and ready to do.  Obviously, it would be preferable to preload with
their specific mail but is it possible to feed bayes for each user with
a generic set of spam/ham?


signature.asc
Description: This is a digitally signed message part


Stupid spammer rules: typos in forged headers

2006-10-04 Thread John D. Hardin

describe QMAIL_TYPO Hand-forged Received header with typos
header   QMAIL_TYPO Received =~ /\.[a-z]{1,4}\s\((?!Qmail)Qm[ail]{3}\)\swith\s/
scoreQMAIL_TYPO 1.00

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  You are in a maze of twisty little protocols,
  all written by Microsoft.
--



RE: What's the best method to use SA?

2006-10-04 Thread Shue, Daniel G.
We use SA, ClamAV, Razor, Pyzor, DCC, etc. with amavis-new and Maia Mailguard.  
Maia is a great way to imitate some of the big expensive spam filters out 
there.  It gives users a web front end for managing their spam and even their 
spam score limits.  It may be a little more than what you want to give your 
users, but none the less its a great admin tool for a network admin trying to 
stop spam.  In my case, I can go to the users that are getting the most false 
negatives and specifically tell Maia that this is spam, NOT ham now train 
bayes.  That to me is an awesome tool.  We also have a spam inbox setup, but 
nobody uses it and still wants to complain about getting spam.  Check it out, 
their web server always seems to be down so look at the Google cached version 
if you can't get their.



From: Clifton Royston [mailto:[EMAIL PROTECTED]
Sent: Wed 10/4/2006 12:51 PM
To: Monty Ree
Cc: users@spamassassin.apache.org
Subject: Re: What's the best method to use SA?



On Wed, Oct 04, 2006 at 04:43:37AM +, Monty Ree wrote:
 Hello.

 I have used SA using with procmail.
 and clamav + sendmail(libmilter) against virus.

 But I have found that other related solutions like
 http://www.mailscanner.info/ or
 http://www.amavis.org/.

 I don't know what's the difference or better between SA using procmail or
 above solutions.
 more fast or more effective??

  The original amavis is not well supported AFAIK.  However, it has a
descendant amavisd-new which is actively developed and supported.
  http://www.ijs.si/software/amavisd/

  I don't know mailscanner, but amavisd-new is a much more efficient
approach for a mailserver, especially at ISP level.  The two main
differences are 1) Spamassassin (and clamav) get run once per incoming
mail, not once for every recipient, and 2) amavisd-new runs as a
daemon, so Spamassassin only has to be compiled in Perl once instead of
once per incoming message.

 Anyone who uses above solutions?

  *Lots* of mailservers use amavisd-new, including many ISPs and 3rd
party mail providers, FWIW.

  -- Clifton

--
Clifton Royston  --  [EMAIL PROTECTED] / [EMAIL PROTECTED]
   President  - I and I Computing * http://www.iandicomputing.com/
 Custom programming, network design, systems and network consulting services

 

I did write it and can't remove it for policy reasons. preciate the flack 
though!






Re: ImageInfo Bug

2006-10-04 Thread Dallas Engelken

Stuart Johnston wrote:

Dallas,

I think there is a bug in the image_size_range function.

my $name = $type.'_dems';

Should probably be more like:

my $name = dems_$type;

Thanks,
Stuart
Yup.. Craig Green made me aware of that last week, and I've been too 
busy to address it.  I'll get it updated on the SARE side shortly.   I 
havent looked at Theo's sandbox lately, but I'd guess its incorrect 
there also then.


Thanks,

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



light-grey listing..? lkml filter probs catching too much ham.

2006-10-04 Thread Linda Walsh

I'm having problems filtering a list I'm on (lkml).

First I had it on normal filter -- but I had too many false
positives.  Finally switched it to a white-list, but now, many
true negatives (spam) get through.

Is there a way to light-grey a list -- not a blanket accept
all, white-list, but something that temporarily moves the
spam-high-water mark for that specific email: i.e. instead of
it taking X points to be marked as SPAM, it adds 5-points to
the threshhold needed to mark the message as spam?

I heard that the list owners attempted to tighten the filters and
had the same problem -- too many ham emails got trapped.  Perhaps
it is all the code that gets published to that list?  Dunno, but
something seems in common with SPAM and, maybe, code (or at least
the normal linux-kernel-mailing-list post) that is making it a hard
list to police (clean) up.

Anyone else have stubborn lists like this or had successes in filtering
lkml?  I even split off code-ish looking posts to a separate folder,
but that still didn't stop the false negatives, so not quite sure
what makes such a list uniquely difficult to filter.

Not the worse problem -- at least it's confined to that folder,
but the various spams that are present make it a bit challenging to
read -- right in the middle of the tech stuff...just on the first
page of titles (conversations hidden under titles), 2/10 titles are
sex related spams.  It's a bit annoying to read through (sigh).

Now why would sex-spammers target lkml-readers.  Do they think
lkml-readers are uniquely more likely to respond to sex-spam?
(Maybe, given the fascination of the average /. reader and
their amusement with pr0n, there could be some basis to the
spammer's methods...?)...

thanks,
-linda



RE: light-grey listing..? lkml filter probs catching too much ham.

2006-10-04 Thread Coffey, Neal
Linda Walsh wrote:
 Is there a way to light-grey a list -- not a blanket accept
 all, white-list, but something that temporarily moves the
 spam-high-water mark for that specific email:

For mailing lists, I use whitelist_to, which by default subtracts 6
points from the email's score. It works since the emails are all to
the mailing list address, and not to mine. This list, for example,
gets:

whitelist_to users@spamassassin.apache.org


Re: FuzzyOCR request

2006-10-04 Thread Alan Munday

decoder wrote the following on 04/10/2006 21:38:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Alan Munday wrote:

Chris

Could you consider adding a configuration parameter which would have
the effect of scoring all results as zero?

This would allow people to configure FuzzyOCR for their systems in
the knowledge that it will not affect the current running state. It
will also allow people to test the effects of FuzzyOCR on their
current traffic before taking it live.

regards

Alan


This seems like a very good idea, I will implement this as soon as I
am able to continue the development again. At the moment I am busy
with unversity stuff but in some weeks I will have more time again :)

Best regards,


Chris


Chris

Thank you for considering this. 


I've been following your developments and looking at how to integrate with my 
(few) systems. But as I don't have a test environment (until I have built a 
VMWare one) I was cautious at trying this with one of the live box's. Zero 
scoring seemed to be a good way round this.

regards

Alan


double letter porn

2006-10-04 Thread Richard Doyle
I've been getting lots of porn site spam containing words with doubled
letters, like this one:


Orrgy pornn parrties! Lotts of
sttupid bitchees gangbangged by queue of guyss.
 annal_nailing and cum__swallowing orgiees.
 archiive of group_ssex materiall!
http://www.teens229mx.com/?lcajuryrpdbejn


Most of these hit razor2, and www.teens???mx.com sooner-or-later show up
on the SURBL and URIBL lists, but nothing seem to catch the misspelled
words.

Can anybody suggest a rule or ruleset to catch these double-letter
obfuscations? I'm using Spamassassin 3.1.4.









RE: double letter porn

2006-10-04 Thread Bret Miller
 I've been getting lots of porn site spam containing words with doubled
 letters, like this one:

 
 Orrgy pornn parrties! Lotts of
 sttupid bitchees gangbangged by queue of guyss.
  annal_nailing and cum__swallowing orgiees.
  archiive of group_ssex materiall!
 http://www.teens229mx.com/?lcajuryrpdbejn
 

 Most of these hit razor2, and www.teens???mx.com
 sooner-or-later show up
 on the SURBL and URIBL lists, but nothing seem to catch the misspelled
 words.

 Can anybody suggest a rule or ruleset to catch these double-letter
 obfuscations? I'm using Spamassassin 3.1.4.

Network tests...

That hit URIBL_Black and the SURBL JP and OB tests.

I'm sure a rule *could* be written, but those are common double-letter
combinations, so it would be a bit more difficult than it seems.

Bret





Re: double letter porn

2006-10-04 Thread Eric A. Hall

On 10/4/2006 5:57 PM, Richard Doyle wrote:
 I've been getting lots of porn site spam containing words with doubled
 letters, like this one:

 Can anybody suggest a rule or ruleset to catch these double-letter
 obfuscations? I'm using Spamassassin 3.1.4.

You'd probably need to write a plug-in that used some kind of
typo-matching logic to find porno words.

Would be a good plug-in actually. Get busy :)

-- 
Eric A. Hallhttp://www.ehsco.com/
Internet Core Protocols  http://www.oreilly.com/catalog/coreprot/


Re: bayes_toks.expire.... can I delete these?

2006-10-04 Thread Matt Kettler
Derek Catanzaro wrote:
 I have a ton of bayes_toks.expire files listed in
 /root/.spamassassin.  Is it safe to delete these files?  
Yes, provided no expire process is currently running and using one.
 I did check the FAQ regarding manybayestoksexpirefiles but from what
 I can tell the directory is not set to use sticky bit.  Here is my ls
 -la results on the directory:

snip
 Running Fedora Core 1
 spamassassin 3.1.0
 MailScanner 4.49.7
1) run sa-learn --force-expire to fix the immediate problem.

2) prevent future problems by fixing your spamassassin timeout value in
MailScanner.cf

Anything under 600 seconds is bad news if you use bayes. In fact, I'd
set it to 3000 seconds. I use MailScanner myself, and have since the SA
2.31 days, and I've NEVER had MailScanner time out a SA process for any
valid reason. I've only had it time out because the timeout value was
too short.

In this case, MailScanner doesn't know that SA is taking a long time
because it's doing it's bayes database maintenance. Therefore, it
assumes SA is in an infinite loop or some other bogus state (which I've
NEVER had happen, nor have I ever even heard of happening to SA), and
kills it.

If this keeps happening, your bayes database will grow without bound and
consume your entire disk. SA NEEDS to expire the bayes tokens at some
point, and this is a very slow process.

Some history about MS and it's timeouts. I've only had one other
situation of timeouts other than bayes.

When I started using MS, it had a SA timeout value equal to the default
RBL timeout in SA. At the time SA just used a fixed 15 second timeout,
and MS only gave SA 15 seconds to run. SA didn't do it's modern dynamic
timeout, so it would always wait 15 seconds, even if it was only waiting
on one RBL. Since SA also took a non-zero amount of time to get to the
point it invoked the RBL, a dead RBL would always result in SA taking
slightly more than 15 seconds to complete. Therefore, if an RBL ever
failed, MS would kill it just before SA would have given up on the RBL,
and you'd wind up with an un-scored message.

Needless to say the timeout feature of MailScanner is one of my least
favorite features of MS, because it seems it always does the wrong thing.





Re: double letter porn

2006-10-04 Thread John D. Hardin
On Wed, 4 Oct 2006, Eric A. Hall wrote:

 On 10/4/2006 5:57 PM, Richard Doyle wrote:
  I've been getting lots of porn site spam containing words with doubled
  letters, like this one:
 
  Can anybody suggest a rule or ruleset to catch these double-letter
  obfuscations? I'm using Spamassassin 3.1.4.
 
 You'd probably need to write a plug-in that used some kind of
 typo-matching logic to find porno words.

/\bss?ee?xx?\b/i
/\boo?rr?gg?yy?\b/i
/\boo?rr?gg?ii?ee?ss?\b/i

etc...

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 [Small arms] are fundamentally dangerous and their removal from the
 equation either by control, neutralisation or removal is essential.
 The first step is to gain information on their numbers and
 whereabouts. -- the UN, who doesn't want to confiscate guns
---



Forged X-Spam headers

2006-10-04 Thread Christopher Martin
I have been noticing the occasional spam slipping past spam assassin 
unscathed lately but have been a bit busy to pay attention (one spam a 
day is much better than the 150 each user used to get). I paid a bit 
more attention to one the other day and noticed it had an X-Spam header 
before it got to spam assassin. For a few seconds I thought that maybe 
my ISP had started tagging silently, until I noticed that the spam score 
was -83... Not the positive score it should have been, so I deduced that 
spammers are forging the X-Spam header to slip by the classification rules.


I had a search on the Nabble archive for this list and couldn't find 
anything specifically about this (it probably got lost in the million 
results that just about any search phrase produces!) so I am hoping 
someone can point me at a solution if it's been discussed before.


Is there an option for the spamassass-milter to strip X-Spam headers 
before the mails are handed to Spam Assassin for processing? If not, is 
there another milter I will need to use? I guess I can put it in between 
milter-regex and spamass-milter.


Any ideas?

Chris M


Re: double letter porn

2006-10-04 Thread Ken

John D. Hardin wrote:

On Wed, 4 Oct 2006, Eric A. Hall wrote:

  

On 10/4/2006 5:57 PM, Richard Doyle wrote:


I've been getting lots of porn site spam containing words with doubled
letters, like this one:
  
Can anybody suggest a rule or ruleset to catch these double-letter

obfuscations? I'm using Spamassassin 3.1.4.
  

You'd probably need to write a plug-in that used some kind of
typo-matching logic to find porno words.



/\bss?ee?xx?\b/i
/\boo?rr?gg?yy?\b/i
/\boo?rr?gg?ii?ee?ss?\b/i

  


Seeing same here; some targetted porn spam with doubled up letters in 
the subject, usually scoring 2-3 due to various SA tests on rcvd lines, 
with very short (2 line) bodies and urls that are not surbl and uribl or 
dob (day old bread) listed yet. Typically they also include somewhat odd 
adjectives, like audacious, immaculate, etc... I've just been reacting 
with similar to what is suggested above, with some success, but it's got 
me wondering if there isn't another list that I can find these on.

Ken  Anderson

etc...

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 [Small arms] are fundamentally dangerous and their removal from the
 equation either by control, neutralisation or removal is essential.
 The first step is to gain information on their numbers and
 whereabouts. -- the UN, who doesn't want to confiscate guns
---

  




Re: bayes_toks.expire.... can I delete these?

2006-10-04 Thread Derek Catanzaro

Matt Kettler wrote:

Derek Catanzaro wrote:
  

I have a ton of bayes_toks.expire files listed in
/root/.spamassassin.  Is it safe to delete these files?  


Yes, provided no expire process is currently running and using one.
  
I did wind up deleting all of the bayes_toks.expire files, there were 
hundreds.

1) run sa-learn --force-expire to fix the immediate problem.
  
After deleting the bayes_toks.expire files I ran sa-learn --force-expire 
and received the result below and it just stayed there for at least 20 
minutes so I forced it to stop.  Is this normal behavior?  Was I too 
impatient with the process?  My bayes_toks file is 321MB, not sure if 
that is part of the issue.


.spamassassin]# sa-learn --force-expire
bayes: synced databases from journal in 0 seconds: 1611 unique entries 
(2099 total entries)

2) prevent future problems by fixing your spamassassin timeout value in
MailScanner.cf

Anything under 600 seconds is bad news if you use bayes. In fact, I'd
set it to 3000 seconds. I use MailScanner myself, and have since the SA
2.31 days, and I've NEVER had MailScanner time out a SA process for any
valid reason. I've only had it time out because the timeout value was
too short.
  
Matt, After posting this to the list I did some more research online and 
found the following thread which you responded to.  I have applied the 
settings listed in this thread to my MS/SA setup.  Do these settings 
still apply in your opinion?  The thread recommends a minimum of 60 
seconds for the spamassassin timeout value, mine is set to 75.  Based on 
what you are saying above I believe I need to increase the spamassassin 
timeout dramatically, can you confirm?  Since I deleted the 
bayes_toks.expire files there has been 1 .expire file generated already, 
so I 'm assuming that should tell me my timeout is still too low?


http://mail-archives.apache.org/mod_mbox/spamassassin-users/200410.mbox/[EMAIL 
PROTECTED]

Thanks for all of the information you provided.  I really appreciate the 
assistance. 


Thanks,
Derek


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.