Re: multiple instances, simplification
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kris Deugau wrote: Gary Smith wrote: Instead of running multiple SA servers, it is possible to run a single consolidated SA server where only the userpref's are different for each spamc caller (given that the local config will override the global config) AND still use a single bayes DB? We use a clustered MySql instance for bayes, and I don't want to have to worry about a bayes DB per user. This big difference between the instances are mostly the required_score threshold, few score overrides and a few custom rules. Any recommendations on how to handle this? I would be really nice to use a single config for all SA instances, whereas the only difference being the user config. If all of the differences are in required_score, custom scores on a few rules, a few fairly trivial rules, etc, then yes, you should be able to do this. Either create real system users filter1, filter2, etc or read up on spamd's virtual user support. A quick read of spamd's man page shows a little clearer and more coherent set of options than I recall from ~2.x. -x and --virtual-config-dir are probably good places to start. -kgd Why don't you just run 3 instances of spamd, each listening on different ports/sockets and each with their own configuration: spamd --siteconfigpath=/etc/spam1 --socketpath=/tmp/spam1.sock --port=783 spamd --siteconfigpath=/etc/spam2 --socketpath=/tmp/spam2.sock --port=784 spamd --siteconfigpath=/etc/spam3 --socketpath=/tmp/spam3.sock --port=785 This way you can enable/disable different plugins for each config as well as having totally different configurations in each instance. Afterwards it's just a matter of calling the right instance from your MDA by choosing the proper socket or tcp-port. Since you use MySql for Bayes, you can configure each instance with the same configuracion so that they all access the same database. And because its just for testing, don't forget to add --min-children=1 --max-children=1 so that each instance only runs one scanner instance, thus conserving RAM. - -- Jorge Valdes jval...@intercom.com.sv -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkvIwoYACgkQkGBK/EMo0qJUmQCfUNkK/hIY+Dps+bALWHzp0v8f TnAAniE39uyZUCypqlrgLoJJa7SBR0ZT =0eCa -END PGP SIGNATURE-
Bad SARE Rule
I have just discovered a small bug in 70_sare_header.cf: headerSARE_FREE_WEBM_RuMailFrom =~ /[EMAIL PROTECTED]/i which should be: headerSARE_FREE_WEBM_RuMailFrom =~ /[EMAIL PROTECTED]/i otherwise it will always match other stuff like: @mail.runner.com, etc. -- Jorge Valdes Intercom El Salvador [EMAIL PROTECTED]
Re: OT: DNS restrictions for a mail server
Matus UHLAR - fantomas wrote: The point of MX is to point to hosts that receive mail, if you send mail to someone. The point of PTR is to provide host name when you receive mail from someone. The PTR has NOTHING to do with MX records and vice versa! So maybe there should be a new type of DNS record: MS (name suggestions welcomed :) ) to let everyone know the server is an _outbound_ only mail server: a server that sends mail for a domain that _may_ also receive mail for the domain. This is a lot simpler than having to parse a SPF record, which may also require additional DNS queries. DNS Configuration Examples: 1.- If a company has a single mail server for both inbound and outbound, it would be required for them to setup both an MX record and a MS record, i.e.: example.com IN MX 10 mail example.com IN MS 10 mail 2.- If a company has different servers for inbound and outbound mail, they could setup different records to allow for all servers to be specified: example.com IN MX 10 mail1 example.com IN MX 20 mail2 example.com IN MS 10 smtp1 example.com IN MS 20 smtp2 When a mail server gets a connection, it would ask for the PTR record in order to check HELO|EHLO argument and get the host's name; when the MAIL FROM: command is received, the domain part could be used to get the MS record and optionally reject the sender if the hostname from the connection is not listed in the MS record list. If we do allow the sender, that could later trigger a SpamAssassin rule that says that the envelope sender is sending mail from a host that is not allowed. whitelist_ms a.b.c.d/x configuration directives could be used to bypass the rule. Also, DNSBL could benefit from these records, as exceptions could be generated for these records in the same manner that MX records generate exceptions. I understand that any new type DNS record must be discussed, and this is not the proper list to do it in, but this discussion is probably appropiate since it's: OT. -- Jorge Valdes
sa-compile errors
I just installed SpamAssassin 3.2.5 and after doing a sa-update and sa-compile I get the following: Illegal octal digit '8' ignored at /usr/local/bin/sa-compile line 631, $fh line 2436. Wide character in print at /usr/local/bin/sa-compile line 385, $fh line 2436. They compile w/o errors, but this does seem strange... -- Jorge Valdes Intercom El Salvador
Segmentation Fault
I am currently running SA 3.2.4 on perl 5.8.8, and want to upgrade perl to 5.10.0 After compiling perl 5.10.0 from source, and regenerating SA 3.2.4 using the newly compiled perl binary, satisfying all Required Module dependencies, I get a segmentation fault when trying to load compiled regex: # /usr/local/bin/spamassassin --lint --debug ... [29188] dbg: zoom: loading compiled ruleset from /var/lib/spamassassin/compiled/3.002004 Segmentation fault # Are these perl version specific? I want to know this since I have spamd running with the compiled rulesets, and don't want to mess up my current config. If the compiled rulesets are perl specific, then I can go ahead and recompile and restart spamd, otherwise, I think I found a bug, since if there is a version mismatch, spamassassin should be able to detect this, and in this case, go on without the compiled rulesets. -- Jorge Valdes Intercom El Salvador [EMAIL PROTECTED]
Re: Feedback on 3.2.4
Rick Macdougall wrote: Skip wrote: Other than the initial reports of performance boost from 3.2.4, I haven't seen much discussion on it as yet. Perhaps it is still too soon to know, but has anyone been seeing other benefits - or identified potential problems? No problems with it at all here (around 7 servers upgraded) and the performance is greatly increased. I went from a 1.4 second average scan time to 0.6 seconds average. HTH, Rick Is this without network tests? Because on my server I had Begin : 2008-01-01 End : 2008-01-15 Summary : 3.1.8 Cnt%% Average MinMax -- -- -- -- -- 18968 46.2% 7.837 1.861 10.000 16640 40.6% 13.654 10.001 19.999 2916 7.1% 23.892 20.003 30.000 1379 3.4% 38.132 30.002 59.882 184 0.4% 74.994 60.041 89.753 37 0.1% 99.552 90.282118.884 904 2.2%154.578120.272364.923 Begin : 2008-01-21 End : 2008-01-24 Summary : version 3.2.4 Cnt%% Average MinMax -- -- -- -- -- 5302 44.9% 7.431 3.872 10.000 4737 40.1% 13.643 10.002 19.998 869 7.4% 24.003 20.008 29.982 555 4.7% 41.017 30.001 59.947 126 1.1% 72.529 60.201 89.941 24 0.2%101.170 90.641118.022 201 1.7%154.700120.454188.119 Because by just the percentages scantime is roughly the same with exactly the same hardware. -- Jorge Valdes
Re: Fwd: FuzzyOcr - how do I teach it?
Brian Wilson wrote: On Feb 20, 2007, at 6:36 PM, Robert S wrote: I have just installed FOCR 3.5.1 with the hashdb option. I have been receiving image spams about China Fruits Corporation which are cleverly designed not to contain words in the words list. How do I insert the hash into the database and label this image as spam? I have tried - unsuccessfully: fuzzy-find --score=10 --learn-spam --verbose 367563:437:282:32::49:1:18:17:55642::44:40:7:37:54950::218:144:172:169:1131::96:99:179:107:1094::100:122:122:115:1093::156:136:162:145:1066 (I got the hash score from running spamassassin -D message) and fuzzy-find --score=10 --learn-spam 'notary_public.gif' I'd like to avoid tampering with the words list to avoid FPs. Could somebody please tell me where I'm going wrong. It would be nice if images could be automatically stored in the hashdb as spam if SA gives them a positive score, but FOCR does not. I have the same problem as you, so you are not alone. I first deleted the hash using fuzzy-find to make sure it didn't exist in either hash, then added it with a score of 10. I re-ran spamassassin with debug on for FuzzyOcr and it did not see the entry in the spam db. I even compared the hashes and they were the same: % fuzzy-find --delete 278502:292:319:128::203:248:219:231:26298::202:200:236:205:25148::247:249:185:241:16996::192:236:242:224:16482::136:34:15:62:630::108:30:158:68:410 Img =278502 292x319x128 % fuzzy-find --learn-spam --score=10 278502:292:319:128::203:248:219:231:26298::202:200:236:205:25148::247:249:185:241:16996::192:236:242:224:16482::136:34:15:62:630::108:30:158:68:410 Img =278502 292x319x128 Rerun the spam through SA (China Fruits also: http://bubba.org/spam/) Adding key to database... [1548] dbg: FuzzyOcr: Not enough OCR Hits without space stripping, doing second matching pass... [1548] info: FuzzyOcr: Message is ham, saving... [1548] info: FuzzyOcr: Adding Hash to /etc/mail/spamassassin/FuzzyOcr.safe.db with score 0 [1548] dbg: FuzzyOcr: Digest: 278502:292:319:128::203:248:219:231:26298::202:200:236:205:25148::247:249:185:241:16996::192:236:242:224:16482::136:34:15:62:630::108:30:158:68:410 Remember that in order for things to work right, the safe database is checked first. The rationale behind this is that if an image fingerprint is found here, there is no need to do OCR. If you already have the image learned as HAM, you must delete it first, then optionally add it to the SPAM database. Jorge. -- -BEGIN GEEK CODE BLOCK- Name: Jorge Valdes EMail: jorgeatjoval.info Version: 3.12 GED/J d+(-) s:+ a+ C++ ULS$ P$ L++ E--- W+++ N+ o? K- w+ M-@ V+ PS- PE+ Y? PGP-@ t++ 5@ X++ R tv+ b+ DI D? G e++ h r+++ y+++ -END GEEK CODE BLOCK-
Re: lint errors
Robert Fitzpatrick wrote: I get the following lint errors: esmtp# spamassassin --lint Subroutine FuzzyOcr::O_NONBLOCK redefined at /usr/local/lib/perl5/5.8.6/Exporter.pm line 65. at /usr/local/lib/perl5/5.8.6/mach/POSIX.pm line 19 [98248] warn: FuzzyOcr: Cannot find executable for pamthreshold [98248] warn: FuzzyOcr: Cannot find executable for tesseract I found this regarding the first one, sounds like it can be ignored? Not sure about the other two. http://www.nabble.com/lint-error-on-FuzzyOcr-3.5.0-rc1-t2906332.html The other two are warnings from FuzzyOcr that it could not find the executables for those programs. You could ignore them and you should still be fine, as long as you still have a scanner available (ocrad|gocr). Jorge.
Re: spammers dodging OCR
Gary V wrote: This morning I received my copy of networkworld. Here is an interesting article: http://www.networkworld.com/columnists/2006/103006buzz-spammers-dodging-ocr.html Gary V _ Add a Yahoo! contact to Windows Live Messenger for a chance to win a free trip! http://www.imagine-windowslive.com/minisites/yahoo/default.aspx?locale=en-ushmtagline FuzzyOcr (devel version) is already catching these... has been for a while now. -- Jorge Valdes
Re: ImageInfo vs FuzzyOCR performance?
Michael Scheidell wrote: -Original Message- From: Jorge Valdes [mailto:[EMAIL PROTECTED] Sent: Friday, October 27, 2006 5:12 PM To: users@spamassassin.apache.org Subject: Re: ImageInfo vs FuzzyOCR performance? SPAM Results: 3936 Message(s) 49.83% 19.399 Average Score 3343 Time(s)7.50% 84.93% Hit Rule: BAYES_99 3068 Time(s)6.88% 77.95% Hit Rule: HTML_MESSAGE 1655 Time(s)3.71% 42.05% Hit Rule: FUZZY_OCR 1527 Time(s)3.42% 38.80% Hit Rule: SARE_GIF_ATTACH 1411 Time(s)3.16% 35.85% Hit Rule: URIBL_BLACK 1274 Time(s)2.86% 32.37% Hit Rule: URIBL_BLACK_OVERLAP 1271 Time(s)2.85% 32.29% Hit Rule: MIME_HTML_ONLY 1215 Time(s)2.72% 30.87% Hit Rule: URIBL_JP_SURBL 1187 Time(s)2.66% 30.16% Hit Rule: RCVD_IN_BL_SPAMCOP_NET 1184 Time(s)2.66% 30.08% Hit Rule: SARE_GIF_STOX What do you use to get those stats? This is from a custom logwatch script that runs every morning... http://www.joval.info/scripts/spamd Jorge -- Jorge Valdes Intercom El Salvador [EMAIL PROTECTED] voz: ++(503) 2278-5068 fax: ++(503) 2265-7025
Re: ImageInfo vs FuzzyOCR performance?
Jeff Chan wrote: Does anyone have any recent feedback about the performance of ImageInfo versus FuzzyOCR about detecting stock image spams (or any others)? Does FuzzyOCR catch significantly more spams than ImageInfo? Cheers, Jeff C. I maybe biased, as I help in FuzzyOcr development, but do use both. ImageInfo is fine and will get you part of the way there, but FuzzyOcr hits more often. Daily scanning ~8Kmsg/day, FuzzyOcr hits ~1600 times and ImageInfo hits 150 times on average. On my system, here are the top10 rule hits from yesterday: SPAM Results: 3936 Message(s) 49.83% 19.399 Average Score 3343 Time(s)7.50% 84.93% Hit Rule: BAYES_99 3068 Time(s)6.88% 77.95% Hit Rule: HTML_MESSAGE 1655 Time(s)3.71% 42.05% Hit Rule: FUZZY_OCR 1527 Time(s)3.42% 38.80% Hit Rule: SARE_GIF_ATTACH 1411 Time(s)3.16% 35.85% Hit Rule: URIBL_BLACK 1274 Time(s)2.86% 32.37% Hit Rule: URIBL_BLACK_OVERLAP 1271 Time(s)2.85% 32.29% Hit Rule: MIME_HTML_ONLY 1215 Time(s)2.72% 30.87% Hit Rule: URIBL_JP_SURBL 1187 Time(s)2.66% 30.16% Hit Rule: RCVD_IN_BL_SPAMCOP_NET 1184 Time(s)2.66% 30.08% Hit Rule: SARE_GIF_STOX Jorge Valdes
Re: Stock spam in images
Jason Haar wrote: I'm having marvelous luck with FuzzyOCR - but the spammers are learning too. When I first started using it just a couple of months ago, it really whacked the image-based spam. You could see why when gocr file.gif returned nice text that was easy to match against. However, now is a different matter. I just got a lose weight spam 10 minutes ago that gocr returns as: lI__c_tc)r _rc_hc_rihc_Ll _cnLl .h1c_Llic_;cll_ _u__c_c __ihc LI l c htc)hlc_rc)c_c_ B llr_ll l hc r_cp_ _ t4 __cc_'un ic) __'ri_c _ hH3s, t_k _ ,r o_E,y _h K E,_ _ ,_ics r _ sncu)._r. t.ihk). lhirkrr x_)) ' gg __, r _ Krvc)_H t)r r_irk cct .__ _ O _' Y O ___ TE_ E _Lncl nLnn __ mc)R hnrtb That tells me to go to www.realhgh dot org , but their GIF processing munged it enough to slip by gocr Not much FuzzyOCR can do with that :-( A few days ago, someone provided me with an image that returned garbage when using plain 'gocr file'. The trick to better detection is to adjust gocr's -l parameter to get better contrast (and better results). By looping 0...255 you will find a setting which will give you good results for this type of image, and if you start getting a lot of these images, adding another scanset will not add too many cpu cycles to your scan. This new setting will almost certainly give you better results with other images too, so unless you have a really overloaded system, adding another scanset will not 'break the bank'. -- Jorge Valdes
Re: Infuriating gif spam...
Steve [Spamassasin] wrote: I've been getting a _lot_ of spam recently which has been defeating my spamassassin configuration - all of it has the same general form... A message with auto-generated prose and an image. I installed FuzzyOCR and this helped, but one particular variant still slips through. The problematic spams all embed a GIF image which confuses gocr (in spite of being easily human-readable) - though I'm not sure why. Three images which defeat FuzzyOCR for me are: http://temporary.shic.dynalias.net/Evil_Spam_Samples.zip I would like to know if there is a straightforward way either (a) to configure FuzzyOCR to decode the text, or (b), assuming that is hard, a way to identify this kind of 'strange' GIF and apply a static score to them (at least as a temporary measure?) Thanks in advance for any pointers... There are multiple images in these gifs, and because the first image is 'junk', sending this image through gocr will yield no results. The problem is that you have to scan all images to find the text. Try this with each image: convert -append News.gif pnm:- | gocr - I have an updated version of the FuzzyOcr plugin that has this and other improvements available here: http://www.joval.info/proj/FuzzyOcr.html -- Jorge Valdes Intercom El Salvador [EMAIL PROTECTED]
Hidden Option?
Hi, just wanted to let everyone know that I found a SPAMD option that cannot be configured via commandline: server-scale-period By looking at the documentation, this option sets how much time the system will wait before determining whether a new child is spawned, the current default is 2 seconds. In my case, I wanted to wait longer in order not to spawn a child only to be killed a couple of seconds later when the min-spare children became available again. I found out that the only way to change this was in the spamd script. I added the option manually ~ line 195. 'server-scale-period=i'= \$opt{'server-scale-period'}, Now I can set the option to my taste. -- Jorge Valdes Intercom El Salvador [EMAIL PROTECTED]
Error when starting spamd 3.1.3
Hi, I get the following error when starting spamd: error: Insecure dependency in `` while running with -T switch at /usr/local/lib/perl5/site_perl/5.8.6/Sys/Hostname/Long.pm line 91, GEN11 line 222. System: Solaris 9/sparc Perl 5.8.6 This does not affect general operation, but it is anoying to see everytime I restart spamd due to option changes and/or configuration changes. -- Jorge Valdes Intercom El Salvador [EMAIL PROTECTED]
Re: Error when starting spamd 3.1.3
Rosenbaum, Larry M. wrote: From: Jorge Valdes [mailto:[EMAIL PROTECTED] Hi, I get the following error when starting spamd: error: Insecure dependency in `` while running with -T switch at /usr/local/lib/perl5/site_perl/5.8.6/Sys/Hostname/Long.pm line 91, GEN11 line 222. System: Solaris 9/sparc Perl 5.8.6 This does not affect general operation, but it is anoying to see everytime I restart spamd due to option changes and/or configuration changes. Try editing Long.pm and replacing this line: my $tmp = `hostname` . '.' . `domainname`; with this: my $tmp = `hostname`; my $tmp2 = `domainname`; $tmp .= .$tmp2; Thanks, that did the trick!! -- Jorge Valdes [EMAIL PROTECTED]
Re: SA 3.04 and RHEL4, Net::DNS isn't working
Steven Stern wrote: On a brand new RHEL4 installation, I've having problems with Net::DNS: debug: is Net::DNS::Resolver available? yes debug: Net::DNS version: 0.51 debug: trying (3) apache.org... debug: looking up NS for 'apache.org' debug: NS lookup of apache.org failed horribly = Perhaps your resolv.conf isn't pointing at a valid server? debug: All NS queries failed = DNS unavailable (set dns_available to override) debug: is DNS available? 0 Dig is able to find apache.org. I've seen some posts on downgrading Net::DNS, but I can't find explicit instructions on how to do it. I installed it via CPAN inside perl. Steven, I use a local DNS cache on my machine, and this for some reason is confusing the tests. When I configure the server to use real DNS servers, that test passes without problems, so I thought it's just a problem on how the test was designed. I force installed the upgrade and added the following in my local.cf: # ## Force DNS ## dns_available yes Bingo... that did the trick, and now DNS checks are enabled and have not had problems with my setup. I even changed the configuracion back to use my local DNS cache and still have not seen problems... Hope it helps. -- Jorge Valdes Intercom El Salvador [EMAIL PROTECTED]