Re: Stock spam in images
I'm having marvelous luck with FuzzyOCR - but the spammers are learning too. When I first started using it just a couple of months ago, it really whacked the image-based spam. You could see why when gocr file.gif returned nice text that was easy to match against. However, now is a different matter. I just got a lose weight spam 10 minutes ago that gocr returns as: lI__c_tc)r _rc_hc_rihc_Ll _cnLl .h1c_Llic_;cll_ _u__c_c __ihc LI l c htc)hlc_rc)c_c_ B llr_ll l hc r_cp_ _ t4 __cc_'un ic) __'ri_c _ hH3s, t_k _ ,r o_E,y _h K E,_ _ ,_ics r _ sncu)._r. t.ihk). lhirkrr x_)) ' gg __, r _ Krvc)_H t)r r_irk cct .__ _ O _' Y O ___ TE_ E _Lncl nLnn __ mc)R hnrtb That tells me to go to www.realhgh dot org , but their GIF processing munged it enough to slip by gocr Not much FuzzyOCR can do with that :-( -- Cheers Jason Haar Information Security Manager, Trimble Navigation Ltd. Phone: +64 3 9635 377 Fax: +64 3 9635 417 PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1
Re: Score=x ?
On Wed, October 4, 2006 05:59, M.Lewis wrote: X-Spam-Status: No, score=x tagged_above=-999 required=5 tests=[] check amavisd.conf for $sa_mail_body_size_limit is highter then the mail size in this mail, if it is amavisd disabled scanning of this mail -- This message was sent using 100% recycled spam mails.
Re: Score=x ?
On Wed, October 4, 2006 09:18, Benny Pedersen wrote: On Wed, October 4, 2006 05:59, M.Lewis wrote: X-Spam-Status: No, score=x tagged_above=-999 required=5 tests=[] check amavisd.conf for $sa_mail_body_size_limit is highter then the mail size in this mail, if it is amavisd disabled scanning of this mail sorry its lower, not highter size, but this is the only time i have seen score=x with amavisd, you can raise the size limit so you scan them aswell, just don't set the limit to high, but still high enough to not let spam through -- This message was sent using 100% recycled spam mails.
Re: Score=x ?
Benny Pedersen wrote: On Wed, October 4, 2006 05:59, M.Lewis wrote: X-Spam-Status: No, score=x tagged_above=-999 required=5 tests=[] check amavisd.conf for $sa_mail_body_size_limit is highter then the mail size in this mail, if it is amavisd disabled scanning of this mail Thanks Benny. I seriously doubt that was it as the message in question was 268KB. However I will check it out. Thank you very much! Mike -- Software engineer: One who engineers others into writing the code for him/her. 02:35:01 up 2:18, 7 users, load average: 0.46, 0.31, 0.24 Linux Registered User #241685 http://counter.li.org
Re: perl hogging my memory?
hey, feel free to edit around that FAQ too, Matt ;) Right now I think that question really *is* the most FA'd Q. --j. Matt Kettler writes: Woot!! Thank you Justin and the rest of the Wiki crew for putting that up! I was getting tired of writing the Are you using sa-blacklist.cf? email over, and over again. Justin Mason wrote: have you looked at http://wiki.apache.org/spamassassin/OutOfMemoryProblems ? note especially the 'Heavyweight custom rules' section. --j. Evan Platt writes: Ok, I've googled and obviously I'm not finding the right solution.. But had to reinstall spamassassin on my os/x 10.4 box. Followed http://developer.apple.com/server/fighting_spam.html . But, my system is running out of memory, and it looks like Perl / spamassassin is the cause . I've omitted everything but the Perl and Spamassassin related entries: Load Avg: 1.97, 1.36, 0.78 CPU usage: 84.4% user, 15.6% sys, 0.0% idle SharedLibs: num = 106, resident = 3.54M code, 364K data, 780K LinkEdit MemRegions: num = 4984, resident = 217M + 1.37M private, 236M shared PhysMem: 44.7M wired, 307M active, 153M inactive, 506M used, 5.54M free VM: 4.00G + 79.0M 50554(137) pageins, 65232(79) pageouts PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE VSIZE 448 spamc0.0% 0:00.00 11518 128K 268K- 396K 27.7M 447 procmail 0.0% 0:00.00 1 816 8K- 364K- 176K 26.7M 445 procmail 0.0% 0:00.02 11516 8K- 364K- 412K 26.7M 416 perl35.1% 0:10.60 110 391 30.4M 233M- 94.2M 391M 394 spamc0.0% 0:00.00 1151888K 268K- 356K 27.7M 393 procmail 0.0% 0:00.00 1 816 8K 316K- 172K 26.7M 391 procmail 0.0% 0:00.02 11516 8K 316K- 364K 26.7M 378 perl10.1% 0:48.50 110 388 150M+ 207M- 217M+ 391M 377 perl44.7% 1:18.63 110 388 26.3M 233M- 72.8M 391M 271 perl 0.0% 0:00.12 11043 1.93M 284K 1.07M 29.1M 65 perl 0.0% 3:41.24 115 387 1.43M- 233M- 56.9M- 391M So what did I do wrong that's causing a Perl process to take up 391 megs? Obviously, I'm only guessing it's spamassassin related, but that's the only thing I can think of using perl. And I see a few google reference to spamassassin and perl. Any other information I can provide, please let me know. Thanks. Evan
Re: HELO test rule-writing questions
Clifton Royston writes: I'm trying to write some SA rules for additional tests on the connecting mailserver's SMTP HELO string, and I have some questions about how to do it. Should I send them to this list or to the dev list? hey Clifton! -- yep, this list. Assuming it's this list, one of the things I'm trying to do is assign a modest score to helo strings containing a bracketed IP address. (This is technically valid in SMTP.) I've read through some of the tests in 20_fake_helo_tests.cf, and it appears they rely on SA's parsing code creating a kind of magic pseudo-header X-Spam-Relays-Untrusted containing a string with the helo and other data? I'm not sure I get the point of the recurring [^\]]+ bits in the examples I looked at. So, the deal is that 'X-Spam-Relays-Untrusted' will contain *all* untrusted relays, one after the other. /^[^\]]+ / ensures that only the helo string from the *most recent* untrusted relay -- the handover into the trusted networks -- is checked. This is required because it's perfectly fine for a user's MUA to use this kind of helo string; the spammy case is when an MTA which is supposedly run by an ISP is handing it over to the recipient's MX, and that one should not use that style of helo. See http://wiki.apache.org/spamassassin/TrustedRelays for more info. So would a test for a bracketed IP address look like this? # [60.222.35.88] header HELO_BRACKETED_IP X-Spam-Relays-Untrusted =~ /^[^\]]+ helo=\[\d+\.\d+\.\d+\.\d+\][^\]]+ auth= /i I want to distinguish this case from a bare IP address (invalid!) which I also want to look at and score: # [60.222.35.88] header HELO_BARE_IP X-Spam-Relays-Untrusted =~ /^[^\]]+ helo=\d+\.\d+\.\d+\.\d+[^\]]+ auth= /i both look good. be sure to let us know if you find something useful ;) --j.
RE: Problem with URIBL rules : false positive and not listed while mannually checking
What version of SpamAssassin are you running? Versions before 3.1 have an infrequent DNS query bug: http://bugzilla.spamassassin.org/show_bug.cgi?id=3997 I'm running SpamAssassin version 3.0.5. (On Perl 5.8.6). I've checked the the bugzilla page about this bug. I dont understand a damn thing 8-|... I guess that i need to update my spamassassin setup and I'm scared. I'm gonna check the wiki for advice on spamassassin updates, but first, get a horse shoe, and recite a hundred mantras ! Another possibility is that there is a DNS proxy or DNS modification service like OpenDNS changing the DNS results in a way that's not compatible with SURBL applications: http://www.surbl.org/faq.html#opendns I dont run any dns service on this box ... It's a clean MailScanner VM and I dont see no process named 'dns' with ps ax In any case, none of the domains mentioned are blacklisted, so there is a problem with your SpamAssassin or DNS. About the checks, did you use http://www.rulesemporium.com/cgi-bin/uribl.cgi ? Do you know a way to see result for each test (PH, OB, etc ... ) ? Thank you for this anwser Jeff
Re: Spamassassin Rules
On Tuesday, October 3, 2006, 6:57:21 PM, Loren Wilton wrote: If you don't have network rules enabled you should enable them. The URIBL-type rules will probably catch the vast majority of this junk. Most of the mis-spelled pharma stuff I get scores around 50. See: http://www.surbl.org/ Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
Re: Problem with URIBL rules : false positive and not listed while mannually checking
I'm running SpamAssassin version 3.0.5. (On Perl 5.8.6). I've checked the the bugzilla page about this bug. I dont understand a damn thing 8-|... I guess that i need to update my spamassassin setup and I'm scared. I'm gonna check the wiki for advice on spamassassin updates, but first, get a horse shoe, and recite a hundred mantras ! Updating from 3.0.5 to the current version isn't particularly painful; certainly not as hard as 2.6 to 3.x was. The amin thing to look out for is things that have moved to plugins, and you will have to enable the plugins to keep your current functionality in some cases. Just look for the *.pre files and uncomment anything that seems appropriate.
Re: Problem with URIBL rules : false positive and not listed while mannually checking
Loren Wilton writes: I'm running SpamAssassin version 3.0.5. (On Perl 5.8.6). I've checked the the bugzilla page about this bug. I dont understand a damn thing 8-|... I guess that i need to update my spamassassin setup and I'm scared. I'm gonna check the wiki for advice on spamassassin updates, but first, get a horse shoe, and recite a hundred mantras ! Updating from 3.0.5 to the current version isn't particularly painful; certainly not as hard as 2.6 to 3.x was. The amin thing to look out for is things that have moved to plugins, and you will have to enable the plugins to keep your current functionality in some cases. Just look for the *.pre files and uncomment anything that seems appropriate. and read the UPGRADE file -- these things are all called out there. --j.
Re: FuzzyOCR seems to not like gif and png
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Loren Wilton wrote: @page Section1 {size: 8.5in 11.0in; margin: 1.0in 1.0in 1.0in 1.0in; } P.MsoNormal { FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: Times New Roman } LI.MsoNormal { FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: Times New Roman } DIV.MsoNormal { FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: Times New Roman } A:link { COLOR: blue; TEXT-DECORATION: underline } SPAN.MsoHyperlink { COLOR: blue; TEXT-DECORATION: underline } A:visited { COLOR: purple; TEXT-DECORATION: underline } SPAN.MsoHyperlinkFollowed { COLOR: purple; TEXT-DECORATION: underline } SPAN.EmailStyle17 { COLOR: windowtext; FONT-FAMILY: Arial; mso-style-type: personal-compose } DIV.Section1 { page: Section1 } There are newer versions of FuzzyOCR that probably fix or at least get around this. A lot of image spam mails have broken images in them, and this messes up a lot of stuff. The latest versions use ImageMagic. This is reputedly hard to install on many systems. But if you can get it installed it seems to work much better in terms of the images that it can handle. You might want to join the FuzzyOCR mailing list: List-Id: devel-spam.lists.own-hero.net List-Unsubscribe: http://lists.own-hero.net/mailman/listinfo/devel-spam, mailto:[EMAIL PROTECTED] List-Archive: http://lists.own-hero.net/mailman/private/devel-spam List-Post: mailto:[EMAIL PROTECTED] List-Help: mailto:[EMAIL PROTECTED] List-Subscribe: http://lists.own-hero.net/mailman/listinfo/devel-spam, mailto:[EMAIL PROTECTED] If you search the list archive you will see a number of posts on the current release and where to get it. I think the current version is something like J. The current version is b. J is a devel version as are all versions higher than b. Please note that when trying out these versions. A new stable version will follow soon, once I get the time again. Chris Loren -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFI61qJQIKXnJyDxURAkZTAJwN39dvgOtmYg4gp63OAivuBx8cYQCgjH7c f3p/ug6HPt+YEjoly1iETPA= =wgR7 -END PGP SIGNATURE-
RE: Spamassassin Rules
Title: RE: Spamassassin Rules -Original Message- From: Jeff Chan [mailto:[EMAIL PROTECTED]] Sent: Wednesday, October 04, 2006 6:13 AM To: Loren Wilton Cc: users@spamassassin.apache.org Subject: Re: Spamassassin Rules On Tuesday, October 3, 2006, 6:57:21 PM, Loren Wilton wrote: If you don't have network rules enabled you should enable them. The URIBL-type rules will probably catch the vast majority of this junk. Most of the mis-spelled pharma stuff I get scores around 50. See: http://www.surbl.org/ Taste: http://www.uribl.com/ Sorry Jeff I couldn't resist. I'm in a weird mood today. ;) Much love for the SURBL team! Go Patriots! Thanks, Chris Santerre SysAdmin and Spamfighter www.rulesemporium.com www.uribl.com
Failing install - CPAN NET::DNS
Hi all Tried to install NET::DNS via CPAN to get my network tests going but get the following error report at the end: sudo cpan -i Net::DNS snip Running make test PERL_DL_NONLAZY=1 /usr/local/bin/perl -MExtUtils::Command::MM -e test_harness(0, 'blib/lib', 'blib/arch') t/*.t t/00-load..ok t/00-pod...skipped all skipped: Test::Pod v0.95 required for testing POD t/00-version...ok t/01-resolver-env..ok t/01-resolver-file.ok 7/8 skipped: Could not read configuration file t/01-resolver-opt..ok t/01-resolver..ok t/02-headerok t/03-question..ok t/04-packet-unique-pushok t/04-packetok t/05-rr-optok t/05-rr-rrsort.ok t/05-rr-sshfp..skipped all skipped: Digest::BubbleBabble not installed. t/05-rr-txtok t/05-rr-unknownok t/05-rrok t/06-updateok t/07-misc..ok t/08-onlineok 73/93 # Failed test 'Socket is ready' # in t/08-online.t at line 176. t/08-onlineok 93/93# Looks like you failed 1 test of 93. t/08-onlinedubious Test returned status 1 (wstat 256, 0x100) DIED. FAILED test 74 Failed 1/93 tests, 98.92% okay (less 3 skipped tests: 89 okay, 95.70%) t/09-tkey..ok t/10-recurse...ok t/11-escapedchars..# Using the XS compiled dn_expand function t/11-escapedchars..ok 96/141# # disabling XS based dns_expand for a moment. t/11-escapedchars..ok 99/141# # Continuing to use the XS based dn_expand() t/11-escapedchars..ok t/11-inet6.ok 10/11 skipped: Socket6 and or IO::Socket::INET6 not loaded Failed Test Stat Wstat Total Fail Failed List of Failed --- t/08-online.t1 256931 1.08% 74 2 tests and 20 subtests skipped. Failed 1/24 test scripts, 95.83% okay. 1/1057 subtests failed, 99.91% okay. make: *** [test_dynamic] Error 2 /usr/bin/make test -- NOT OK Running make install make test had returned bad status, won't install without force Any tips on how to tweak to pass the test or do you think it is safe to use a bit of force? Could it be the missing IO::Socket::INET6? Setup: OSX 10.3.9, Communigate 4.2.8, CGPSA 1.4, SA 3.1.3 Thomas Ericsson _ Fido Film AB StadsgÄrden 17 SE-116 45 Stockholm T: int+46 (0)8 556 990 06 F: int+46 (0)8 556 990 01 http://www.fido.se _
RE: Spamassassin Rules
Title: RE: Spamassassin Rules Yes, spamassassin definitely RULES! ;-D
Re: perl hogging my memory?
I definitely have to say that between OutOfMemoryProblems and TrustPath we've probably covered about 20% of the problems on the list :) Justin Mason wrote: hey, feel free to edit around that FAQ too, Matt ;) Right now I think that question really *is* the most FA'd Q. --j.
R: perl hogging my memory?
I dream of a amavis+spamassassin system developed in C++ language and with separate rule compiler and matching daemon... Also, this thing of running a perl regex matching for each (enabled) rule is a bit brain damaged... Why not invert the flow and build something like flex + bison, ie: a grammar parser: you feed it with your text, and it replies with the hitten rules. Well, just an idea. PS: brain damage is just an eufemism: it actually works! giampaolo I definitely have to say that between OutOfMemoryProblems and TrustPath we've probably covered about 20% of the problems on the list :) Justin Mason wrote: hey, feel free to edit around that FAQ too, Matt ;) Right now I think that question really *is* the most FA'd Q. --j.
Re: Problem with URIBL rules : false positive and not listed while mannually checking
On Wednesday, October 4, 2006, 3:11:16 AM, Fabien GARZIANO wrote: Another possibility is that there is a DNS proxy or DNS modification service like OpenDNS changing the DNS results in a way that's not compatible with SURBL applications: http://www.surbl.org/faq.html#opendns I dont run any dns service on this box ... It's a clean MailScanner VM and I dont see no process named 'dns' with ps ax There's usually some DNS service on the box or on your local or ISP network. If you're on a Unix/Linux/BSD box it's usually called 'named'. As long as DNS isn't doing anything unusual, then it's a non-issue. Just use normal, default DNS service if your message volume is less than 100k to 250k per day. In any case, none of the domains mentioned are blacklisted, so there is a problem with your SpamAssassin or DNS. About the checks, did you use http://www.rulesemporium.com/cgi-bin/uribl.cgi ? I did a local DNS query: dig somedomain.com.multi.surbl.org a If you get NXDOMAIN then it's not listed. Do you know a way to see result for each test (PH, OB, etc ... ) ? dig somedomain.com.multi.surbl.org txt will show the lists; so will the lookup page, and so will: spamassassin -D some_message_in_a_file Cheers, Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
RE: Problem with URIBL rules : false positive and not listed while mannually checking
I did a local DNS query: dig somedomain.com.multi.surbl.org a If you get NXDOMAIN then it's not listed. Do you know a way to see result for each test (PH, OB, etc ... ) ? dig somedomain.com.multi.surbl.org txt will show the lists; so will the lookup page, and so will: spamassassin -D some_message_in_a_file Thanks a lot for the tip with dig. That's what I was looking for. There's usually some DNS service on the box or on your local or ISP network. If you're on a Unix/Linux/BSD box it's usually called 'named'. As long as DNS isn't doing anything unusual, then it's a non-issue. Just use normal, default DNS service if your message volume is less than 100k to 250k per day. And for dns, I'm sorry, I typed it too fast and when I meant no 'dns' i also meant no 'named' process. On this box, i've tried :# dig nortel.com.multi.surbl.org a And it returned me NXDOMAIN as you said, so I guess it may not be a dns problem on this box. (the DNS serveur answering is my ISP's). I think i'm gonna update Spamassassin anyway, it should be a good reason to do it. Thanks for all this goods anwsers ! P.S : sorry Jeff if you receive this Email twice
FuzzyOCR gocr spins wheels
Hi people, Got a problem with FuzzyOCR. I'm using version 2.3j but had the same problem with 2.1. At points through out the day there will be 3 - 4 instances of gocr running all fighting over CPU. They run forever, which I though that SA would kill the processes if they took to long which is fine with me. Any ways this of course slows the box and mail will back up. I know when I was at 2.1 I went with the GIFLIB ver .41 instead of the .40. From there I couldn't apply that patch that the plug-in comes with. I thought that maybe this would be the culprit for sure if there was one but I thought I'd check here first before I got into the hassle of removing some libraries. TIA Daniel This email and any files transmitted with it are confidential and intended for use only by the individual or entity named above. If you are not the intended recipient or the employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any disclosure, dissemination, distribution, copying of this communication, or unauthorized use is strictly prohibited. Please notify us immediately by reply email and then delete this message from your system. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of Randolph County Government. This email and any file attachments have been scanned for potential viruses; however, the recipient should check this email for the presence of viruses and/or malicious code. Randolph County accepts no liability for any damage transmitted via this email.
ImageInfo Bug
Dallas, I think there is a bug in the image_size_range function. my $name = $type.'_dems'; Should probably be more like: my $name = dems_$type; Thanks, Stuart
RE: Stock spam in images
Title: RE: Stock spam in images Greetings list, The old timers on the list know I tend to try things outside the norm. Like my strong resistence to sitewide bayes. Well for months I've been using a simpler approach to these Stock Spams w/ images. I don't look at the image at all. Heresy I know, but thats the way I roll :) This goes back to my old philosophy of: One rule hit (either FP, FN, or legit) should not make a messege an FP, FN, or legit on its own. With that in mind, I wrote a series of 3-4 simple rules, scored them low, and watched the results. These are unpublished rules, and I'm not sure they are ready to be published just yet. But this is about the idea of what I'm doing. Simple example: Is there even an inline image attached? (note: I'm talking about a src="" here, not an attached image to the email!) Well if there is, why not add low points? Which is what I do. I actually score this at a crazy 1.5! Before you scream to the heavens that I'm nuts, let me continue. EVERYONE of these Stock image spams has hit mutiple rules. SARE rules, standard rules , and my 3-4 rules I wrote from finding the simple patterns in these spams. This is the key. Combined rule hits mark it as spam. I've yet to see a single FP caused by ONE of these rules. Sure, if a legit mail comes thru with a src="" it will hit the rule. But I've never seen one that hit the other rules and passed it over the marking threshold. This is not a knew idea by any means, but one that seems to be lost under new fangled fuzzyOCR. I think FuzzyOCR is wonderful. Imageinfo is great! But IMHO, wasting too many CPU cycles and energy. Spammers already trying animated gifs, and noise. I wanted to quietly give this method a try and it seems to be working beautifully. I say my rules aren't ready for publishing because for the public I'd like the rules to be tighter. Prbly used as metas to reduce FPs in general world usage. Anyway, I just wanted to say that sometimes the simple ways still work great! (Any spelling errors in this post are your fault!) Thanks, Chris Santerre SysAdmin and Spamfighter www.rulesemporium.com www.uribl.com
Re: What's the best method to use SA?
On Wed, Oct 04, 2006 at 04:43:37AM +, Monty Ree wrote: Hello. I have used SA using with procmail. and clamav + sendmail(libmilter) against virus. But I have found that other related solutions like http://www.mailscanner.info/ or http://www.amavis.org/. I don't know what's the difference or better between SA using procmail or above solutions. more fast or more effective?? The original amavis is not well supported AFAIK. However, it has a descendant amavisd-new which is actively developed and supported. http://www.ijs.si/software/amavisd/ I don't know mailscanner, but amavisd-new is a much more efficient approach for a mailserver, especially at ISP level. The two main differences are 1) Spamassassin (and clamav) get run once per incoming mail, not once for every recipient, and 2) amavisd-new runs as a daemon, so Spamassassin only has to be compiled in Perl once instead of once per incoming message. Anyone who uses above solutions? *Lots* of mailservers use amavisd-new, including many ISPs and 3rd party mail providers, FWIW. -- Clifton -- Clifton Royston -- [EMAIL PROTECTED] / [EMAIL PROTECTED] President - I and I Computing * http://www.iandicomputing.com/ Custom programming, network design, systems and network consulting services
RE: What's the best method to use SA?
Title: RE: What's the best method to use SA? -Original Message- From: Monty Ree [mailto:[EMAIL PROTECTED]] Sent: Wednesday, October 04, 2006 12:44 AM To: users@spamassassin.apache.org Subject: What's the best method to use SA? Hello. I have used SA using with procmail. and clamav + sendmail(libmilter) against virus. But I have found that other related solutions like http://www.mailscanner.info/ or http://www.amavis.org/. I don't know what's the difference or better between SA using procmail or above solutions. more fast or more effective?? Anyone who uses above solutions? I use sendmail and procmail. I think that combo is very good. Lets you do some neat things. --Chris
Re: Stock spam in images
Jason Haar wrote: I'm having marvelous luck with FuzzyOCR - but the spammers are learning too. When I first started using it just a couple of months ago, it really whacked the image-based spam. You could see why when gocr file.gif returned nice text that was easy to match against. However, now is a different matter. I just got a lose weight spam 10 minutes ago that gocr returns as: lI__c_tc)r _rc_hc_rihc_Ll _cnLl .h1c_Llic_;cll_ _u__c_c __ihc LI l c htc)hlc_rc)c_c_ B llr_ll l hc r_cp_ _ t4 __cc_'un ic) __'ri_c _ hH3s, t_k _ ,r o_E,y _h K E,_ _ ,_ics r _ sncu)._r. t.ihk). lhirkrr x_)) ' gg __, r _ Krvc)_H t)r r_irk cct .__ _ O _' Y O ___ TE_ E _Lncl nLnn __ mc)R hnrtb That tells me to go to www.realhgh dot org , but their GIF processing munged it enough to slip by gocr Not much FuzzyOCR can do with that :-( A few days ago, someone provided me with an image that returned garbage when using plain 'gocr file'. The trick to better detection is to adjust gocr's -l parameter to get better contrast (and better results). By looping 0...255 you will find a setting which will give you good results for this type of image, and if you start getting a lot of these images, adding another scanset will not add too many cpu cycles to your scan. This new setting will almost certainly give you better results with other images too, so unless you have a really overloaded system, adding another scanset will not 'break the bank'. -- Jorge Valdes
bayes_toks.expire.... can I delete these?
I have a ton of bayes_toks.expire files listed in /root/.spamassassin. Is it safe to delete these files? I did check the FAQ regarding manybayestoksexpirefiles but from what I can tell the directory is not set to use sticky bit. Here is my ls -la results on the directory: [EMAIL PROTECTED] .spamassassin]# ls -la total 12803624 drwx--2 root root 163840 Oct 4 11:43 . drwxr-x---6 root root 4096 Oct 4 11:03 .. -rw---1 root root12288 May 16 2005 auto-whitelist ---snip--- I have also been experiencing SpamAssassin timed out errors in my maillog over the past couple of days, would the bayes_toks.expire files have anything to do with this? If not, I will review the the FAQ's and post a new topic if I need assistance. Running Fedora Core 1 spamassassin 3.1.0 MailScanner 4.49.7 Perl 5.8.1 MTA - sendmail 8.13.5 Thanks, Derek -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Re: What's the best method to use SA?
Title: RE: What's the best method to use SA? Sendmail/Procmail /etc/procmailrc: :0fw* 115000* ! ^(TO|Cc):.(user1noscan|user2noscan|user3noscan)* ! ^Return-Path: \\* ! ^List-Id:.\MUNGED.yahoogroups.com\* ! ^Disposition-Notification-To:.*MUNGED* ! ^Received:.(domain1.com|domain2.com|domain3.com)* ! ^To:.*abuse* ! ^Message-Id:.*(MessageID1|MessageID2|MessageID3)| /usr/bin/spamc -d 192.168.0.200 -p 789 -t 60
Re: What's the best method to use SA?
I use Exim with the integrated SA ACL. I'm really pleased with how it works. http://www.exim.org/exim-html-4.62/doc/html/spec_html/ch40.html /Andreas
switching from global bayes to per-user bayes
I am looking into switching from a global bayes/awl/setting environment to a per-user environment with MySQL as a backend. puts on asbestos suit Would anyone care to offer an opinion as to whether and/or to what degree this might make in overall effectiveness? Anyone back up that opinion with cold hard facts? Will I be able to migrate small sets of users from global to per-user or will I have to make the jump for all my end-users/domains at once? I'd like to preload the bayes db for each user so that's it's 'primed' and ready to do. Obviously, it would be preferable to preload with their specific mail but is it possible to feed bayes for each user with a generic set of spam/ham? signature.asc Description: This is a digitally signed message part
Stupid spammer rules: typos in forged headers
describe QMAIL_TYPO Hand-forged Received header with typos header QMAIL_TYPO Received =~ /\.[a-z]{1,4}\s\((?!Qmail)Qm[ail]{3}\)\swith\s/ scoreQMAIL_TYPO 1.00 -- John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- You are in a maze of twisty little protocols, all written by Microsoft. --
RE: What's the best method to use SA?
We use SA, ClamAV, Razor, Pyzor, DCC, etc. with amavis-new and Maia Mailguard. Maia is a great way to imitate some of the big expensive spam filters out there. It gives users a web front end for managing their spam and even their spam score limits. It may be a little more than what you want to give your users, but none the less its a great admin tool for a network admin trying to stop spam. In my case, I can go to the users that are getting the most false negatives and specifically tell Maia that this is spam, NOT ham now train bayes. That to me is an awesome tool. We also have a spam inbox setup, but nobody uses it and still wants to complain about getting spam. Check it out, their web server always seems to be down so look at the Google cached version if you can't get their. From: Clifton Royston [mailto:[EMAIL PROTECTED] Sent: Wed 10/4/2006 12:51 PM To: Monty Ree Cc: users@spamassassin.apache.org Subject: Re: What's the best method to use SA? On Wed, Oct 04, 2006 at 04:43:37AM +, Monty Ree wrote: Hello. I have used SA using with procmail. and clamav + sendmail(libmilter) against virus. But I have found that other related solutions like http://www.mailscanner.info/ or http://www.amavis.org/. I don't know what's the difference or better between SA using procmail or above solutions. more fast or more effective?? The original amavis is not well supported AFAIK. However, it has a descendant amavisd-new which is actively developed and supported. http://www.ijs.si/software/amavisd/ I don't know mailscanner, but amavisd-new is a much more efficient approach for a mailserver, especially at ISP level. The two main differences are 1) Spamassassin (and clamav) get run once per incoming mail, not once for every recipient, and 2) amavisd-new runs as a daemon, so Spamassassin only has to be compiled in Perl once instead of once per incoming message. Anyone who uses above solutions? *Lots* of mailservers use amavisd-new, including many ISPs and 3rd party mail providers, FWIW. -- Clifton -- Clifton Royston -- [EMAIL PROTECTED] / [EMAIL PROTECTED] President - I and I Computing * http://www.iandicomputing.com/ Custom programming, network design, systems and network consulting services I did write it and can't remove it for policy reasons. preciate the flack though!
Re: ImageInfo Bug
Stuart Johnston wrote: Dallas, I think there is a bug in the image_size_range function. my $name = $type.'_dems'; Should probably be more like: my $name = dems_$type; Thanks, Stuart Yup.. Craig Green made me aware of that last week, and I've been too busy to address it. I'll get it updated on the SARE side shortly. I havent looked at Theo's sandbox lately, but I'd guess its incorrect there also then. Thanks, -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
light-grey listing..? lkml filter probs catching too much ham.
I'm having problems filtering a list I'm on (lkml). First I had it on normal filter -- but I had too many false positives. Finally switched it to a white-list, but now, many true negatives (spam) get through. Is there a way to light-grey a list -- not a blanket accept all, white-list, but something that temporarily moves the spam-high-water mark for that specific email: i.e. instead of it taking X points to be marked as SPAM, it adds 5-points to the threshhold needed to mark the message as spam? I heard that the list owners attempted to tighten the filters and had the same problem -- too many ham emails got trapped. Perhaps it is all the code that gets published to that list? Dunno, but something seems in common with SPAM and, maybe, code (or at least the normal linux-kernel-mailing-list post) that is making it a hard list to police (clean) up. Anyone else have stubborn lists like this or had successes in filtering lkml? I even split off code-ish looking posts to a separate folder, but that still didn't stop the false negatives, so not quite sure what makes such a list uniquely difficult to filter. Not the worse problem -- at least it's confined to that folder, but the various spams that are present make it a bit challenging to read -- right in the middle of the tech stuff...just on the first page of titles (conversations hidden under titles), 2/10 titles are sex related spams. It's a bit annoying to read through (sigh). Now why would sex-spammers target lkml-readers. Do they think lkml-readers are uniquely more likely to respond to sex-spam? (Maybe, given the fascination of the average /. reader and their amusement with pr0n, there could be some basis to the spammer's methods...?)... thanks, -linda
RE: light-grey listing..? lkml filter probs catching too much ham.
Linda Walsh wrote: Is there a way to light-grey a list -- not a blanket accept all, white-list, but something that temporarily moves the spam-high-water mark for that specific email: For mailing lists, I use whitelist_to, which by default subtracts 6 points from the email's score. It works since the emails are all to the mailing list address, and not to mine. This list, for example, gets: whitelist_to users@spamassassin.apache.org
Re: FuzzyOCR request
decoder wrote the following on 04/10/2006 21:38: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Alan Munday wrote: Chris Could you consider adding a configuration parameter which would have the effect of scoring all results as zero? This would allow people to configure FuzzyOCR for their systems in the knowledge that it will not affect the current running state. It will also allow people to test the effects of FuzzyOCR on their current traffic before taking it live. regards Alan This seems like a very good idea, I will implement this as soon as I am able to continue the development again. At the moment I am busy with unversity stuff but in some weeks I will have more time again :) Best regards, Chris Chris Thank you for considering this. I've been following your developments and looking at how to integrate with my (few) systems. But as I don't have a test environment (until I have built a VMWare one) I was cautious at trying this with one of the live box's. Zero scoring seemed to be a good way round this. regards Alan
double letter porn
I've been getting lots of porn site spam containing words with doubled letters, like this one: Orrgy pornn parrties! Lotts of sttupid bitchees gangbangged by queue of guyss. annal_nailing and cum__swallowing orgiees. archiive of group_ssex materiall! http://www.teens229mx.com/?lcajuryrpdbejn Most of these hit razor2, and www.teens???mx.com sooner-or-later show up on the SURBL and URIBL lists, but nothing seem to catch the misspelled words. Can anybody suggest a rule or ruleset to catch these double-letter obfuscations? I'm using Spamassassin 3.1.4.
RE: double letter porn
I've been getting lots of porn site spam containing words with doubled letters, like this one: Orrgy pornn parrties! Lotts of sttupid bitchees gangbangged by queue of guyss. annal_nailing and cum__swallowing orgiees. archiive of group_ssex materiall! http://www.teens229mx.com/?lcajuryrpdbejn Most of these hit razor2, and www.teens???mx.com sooner-or-later show up on the SURBL and URIBL lists, but nothing seem to catch the misspelled words. Can anybody suggest a rule or ruleset to catch these double-letter obfuscations? I'm using Spamassassin 3.1.4. Network tests... That hit URIBL_Black and the SURBL JP and OB tests. I'm sure a rule *could* be written, but those are common double-letter combinations, so it would be a bit more difficult than it seems. Bret
Re: double letter porn
On 10/4/2006 5:57 PM, Richard Doyle wrote: I've been getting lots of porn site spam containing words with doubled letters, like this one: Can anybody suggest a rule or ruleset to catch these double-letter obfuscations? I'm using Spamassassin 3.1.4. You'd probably need to write a plug-in that used some kind of typo-matching logic to find porno words. Would be a good plug-in actually. Get busy :) -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: bayes_toks.expire.... can I delete these?
Derek Catanzaro wrote: I have a ton of bayes_toks.expire files listed in /root/.spamassassin. Is it safe to delete these files? Yes, provided no expire process is currently running and using one. I did check the FAQ regarding manybayestoksexpirefiles but from what I can tell the directory is not set to use sticky bit. Here is my ls -la results on the directory: snip Running Fedora Core 1 spamassassin 3.1.0 MailScanner 4.49.7 1) run sa-learn --force-expire to fix the immediate problem. 2) prevent future problems by fixing your spamassassin timeout value in MailScanner.cf Anything under 600 seconds is bad news if you use bayes. In fact, I'd set it to 3000 seconds. I use MailScanner myself, and have since the SA 2.31 days, and I've NEVER had MailScanner time out a SA process for any valid reason. I've only had it time out because the timeout value was too short. In this case, MailScanner doesn't know that SA is taking a long time because it's doing it's bayes database maintenance. Therefore, it assumes SA is in an infinite loop or some other bogus state (which I've NEVER had happen, nor have I ever even heard of happening to SA), and kills it. If this keeps happening, your bayes database will grow without bound and consume your entire disk. SA NEEDS to expire the bayes tokens at some point, and this is a very slow process. Some history about MS and it's timeouts. I've only had one other situation of timeouts other than bayes. When I started using MS, it had a SA timeout value equal to the default RBL timeout in SA. At the time SA just used a fixed 15 second timeout, and MS only gave SA 15 seconds to run. SA didn't do it's modern dynamic timeout, so it would always wait 15 seconds, even if it was only waiting on one RBL. Since SA also took a non-zero amount of time to get to the point it invoked the RBL, a dead RBL would always result in SA taking slightly more than 15 seconds to complete. Therefore, if an RBL ever failed, MS would kill it just before SA would have given up on the RBL, and you'd wind up with an un-scored message. Needless to say the timeout feature of MailScanner is one of my least favorite features of MS, because it seems it always does the wrong thing.
Re: double letter porn
On Wed, 4 Oct 2006, Eric A. Hall wrote: On 10/4/2006 5:57 PM, Richard Doyle wrote: I've been getting lots of porn site spam containing words with doubled letters, like this one: Can anybody suggest a rule or ruleset to catch these double-letter obfuscations? I'm using Spamassassin 3.1.4. You'd probably need to write a plug-in that used some kind of typo-matching logic to find porno words. /\bss?ee?xx?\b/i /\boo?rr?gg?yy?\b/i /\boo?rr?gg?ii?ee?ss?\b/i etc... -- John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- [Small arms] are fundamentally dangerous and their removal from the equation either by control, neutralisation or removal is essential. The first step is to gain information on their numbers and whereabouts. -- the UN, who doesn't want to confiscate guns ---
Forged X-Spam headers
I have been noticing the occasional spam slipping past spam assassin unscathed lately but have been a bit busy to pay attention (one spam a day is much better than the 150 each user used to get). I paid a bit more attention to one the other day and noticed it had an X-Spam header before it got to spam assassin. For a few seconds I thought that maybe my ISP had started tagging silently, until I noticed that the spam score was -83... Not the positive score it should have been, so I deduced that spammers are forging the X-Spam header to slip by the classification rules. I had a search on the Nabble archive for this list and couldn't find anything specifically about this (it probably got lost in the million results that just about any search phrase produces!) so I am hoping someone can point me at a solution if it's been discussed before. Is there an option for the spamassass-milter to strip X-Spam headers before the mails are handed to Spam Assassin for processing? If not, is there another milter I will need to use? I guess I can put it in between milter-regex and spamass-milter. Any ideas? Chris M
Re: double letter porn
John D. Hardin wrote: On Wed, 4 Oct 2006, Eric A. Hall wrote: On 10/4/2006 5:57 PM, Richard Doyle wrote: I've been getting lots of porn site spam containing words with doubled letters, like this one: Can anybody suggest a rule or ruleset to catch these double-letter obfuscations? I'm using Spamassassin 3.1.4. You'd probably need to write a plug-in that used some kind of typo-matching logic to find porno words. /\bss?ee?xx?\b/i /\boo?rr?gg?yy?\b/i /\boo?rr?gg?ii?ee?ss?\b/i Seeing same here; some targetted porn spam with doubled up letters in the subject, usually scoring 2-3 due to various SA tests on rcvd lines, with very short (2 line) bodies and urls that are not surbl and uribl or dob (day old bread) listed yet. Typically they also include somewhat odd adjectives, like audacious, immaculate, etc... I've just been reacting with similar to what is suggested above, with some success, but it's got me wondering if there isn't another list that I can find these on. Ken Anderson etc... -- John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- [Small arms] are fundamentally dangerous and their removal from the equation either by control, neutralisation or removal is essential. The first step is to gain information on their numbers and whereabouts. -- the UN, who doesn't want to confiscate guns ---
Re: bayes_toks.expire.... can I delete these?
Matt Kettler wrote: Derek Catanzaro wrote: I have a ton of bayes_toks.expire files listed in /root/.spamassassin. Is it safe to delete these files? Yes, provided no expire process is currently running and using one. I did wind up deleting all of the bayes_toks.expire files, there were hundreds. 1) run sa-learn --force-expire to fix the immediate problem. After deleting the bayes_toks.expire files I ran sa-learn --force-expire and received the result below and it just stayed there for at least 20 minutes so I forced it to stop. Is this normal behavior? Was I too impatient with the process? My bayes_toks file is 321MB, not sure if that is part of the issue. .spamassassin]# sa-learn --force-expire bayes: synced databases from journal in 0 seconds: 1611 unique entries (2099 total entries) 2) prevent future problems by fixing your spamassassin timeout value in MailScanner.cf Anything under 600 seconds is bad news if you use bayes. In fact, I'd set it to 3000 seconds. I use MailScanner myself, and have since the SA 2.31 days, and I've NEVER had MailScanner time out a SA process for any valid reason. I've only had it time out because the timeout value was too short. Matt, After posting this to the list I did some more research online and found the following thread which you responded to. I have applied the settings listed in this thread to my MS/SA setup. Do these settings still apply in your opinion? The thread recommends a minimum of 60 seconds for the spamassassin timeout value, mine is set to 75. Based on what you are saying above I believe I need to increase the spamassassin timeout dramatically, can you confirm? Since I deleted the bayes_toks.expire files there has been 1 .expire file generated already, so I 'm assuming that should tell me my timeout is still too low? http://mail-archives.apache.org/mod_mbox/spamassassin-users/200410.mbox/[EMAIL PROTECTED] Thanks for all of the information you provided. I really appreciate the assistance. Thanks, Derek -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.