Re: multiple instances, simplification

2010-04-16 Thread Jorge Valdes
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Kris Deugau wrote:
 Gary Smith wrote:
 Instead of running multiple SA servers, it is possible to run a single
consolidated SA server where only the userpref's are different for each
spamc caller (given that the local config will override the global
config) AND still use a single bayes DB?  We use a clustered MySql
instance for bayes, and I don't want to have to worry about a bayes DB
per user.

 This big difference between the instances are mostly the
required_score threshold, few score overrides and a few custom rules.

 Any recommendations on how to handle this?  I would be really nice to
use a single config for all SA instances, whereas the only difference
being the user config.

 If all of the differences are in required_score, custom scores on a few
rules, a few fairly trivial rules, etc, then yes, you should be able to
do this.

 Either create real system users filter1, filter2, etc or read up on
spamd's virtual user support.  A quick read of spamd's man page shows a
little clearer and more coherent set of options than I recall from ~2.x.

 -x and --virtual-config-dir are probably good places to start.

 -kgd

Why don't you just run 3 instances of spamd, each listening on different
ports/sockets and each with their own configuration:

spamd --siteconfigpath=/etc/spam1 --socketpath=/tmp/spam1.sock --port=783
spamd --siteconfigpath=/etc/spam2 --socketpath=/tmp/spam2.sock --port=784
spamd --siteconfigpath=/etc/spam3 --socketpath=/tmp/spam3.sock --port=785

This way you can enable/disable different plugins for each config as
well as having totally different configurations in each instance.
Afterwards it's just a matter of calling the right instance from your
MDA by choosing the proper socket or tcp-port.

Since you use MySql for Bayes, you can configure each instance with the
same configuracion so that they all access the same database. And
because its just for testing, don't forget to add --min-children=1
--max-children=1 so that each instance only runs one scanner instance,
thus conserving RAM.

- --
Jorge Valdes
jval...@intercom.com.sv

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkvIwoYACgkQkGBK/EMo0qJUmQCfUNkK/hIY+Dps+bALWHzp0v8f
TnAAniE39uyZUCypqlrgLoJJa7SBR0ZT
=0eCa
-END PGP SIGNATURE-



Bad SARE Rule

2008-10-24 Thread Jorge Valdes
I have just discovered a small bug in 70_sare_header.cf:

headerSARE_FREE_WEBM_RuMailFrom =~ /[EMAIL PROTECTED]/i

which should be:

headerSARE_FREE_WEBM_RuMailFrom =~ /[EMAIL PROTECTED]/i

otherwise it will always match other stuff like: @mail.runner.com, etc.

-- 
Jorge Valdes
Intercom El Salvador
[EMAIL PROTECTED]




Re: OT: DNS restrictions for a mail server

2008-10-21 Thread Jorge Valdes
Matus UHLAR - fantomas wrote:

  The point of MX is to point to hosts that receive mail, if you send mail to
  someone.
 
  The point of PTR is to provide host name when you receive mail from someone.
 
  The PTR has NOTHING to do with MX records and vice versa!

   
So maybe there should be a new type of DNS record: MS (name suggestions
welcomed  :)  ) to let everyone know the server is an _outbound_ only mail
server: a server that sends mail for a domain that _may_ also receive
mail for the domain. This is a lot simpler than having to parse a SPF
record, which may also require additional DNS queries.

DNS Configuration Examples:

1.- If a company has a single mail server for both inbound and outbound,
it would be required for them to setup both an MX record and a MS
record, i.e.:

example.com IN MX 10 mail
example.com IN MS 10 mail

2.- If a company has different servers for inbound and outbound mail,
they could setup different records to allow for all servers to be specified:

example.com IN MX 10 mail1
example.com IN MX 20 mail2
example.com IN MS 10 smtp1
example.com IN MS 20 smtp2

When a mail server gets a connection, it would ask for the PTR record in
order to check HELO|EHLO argument and get the host's name; when the MAIL
FROM: command is received, the domain part could be used to get the MS
record and optionally reject the sender if the hostname from the
connection is not listed in the MS record list. If we do allow the
sender, that could later trigger a SpamAssassin rule that says that the
envelope sender is sending mail from a host that is not allowed.
whitelist_ms a.b.c.d/x configuration directives could be used to bypass
the rule.

Also, DNSBL could benefit from these records, as exceptions could be
generated for these records in the same manner that MX records generate
exceptions. I understand that any new type DNS record must be discussed,
and this is not the proper list to do it in, but this discussion is
probably appropiate since it's: OT.

-- Jorge Valdes



sa-compile errors

2008-09-12 Thread Jorge Valdes
I just installed SpamAssassin 3.2.5 and after doing a sa-update and
sa-compile I get the following:

Illegal octal digit '8' ignored at /usr/local/bin/sa-compile line 631,
$fh line 2436.
Wide character in print at /usr/local/bin/sa-compile line 385, $fh
line 2436.

They compile w/o errors, but this does seem strange...

-- 
Jorge Valdes
Intercom El Salvador




Segmentation Fault

2008-09-08 Thread Jorge Valdes
I am currently running SA 3.2.4 on perl 5.8.8, and want to upgrade perl
to 5.10.0

After compiling perl 5.10.0 from source, and regenerating SA 3.2.4 using
the newly compiled perl binary, satisfying all Required Module
dependencies, I get a segmentation fault when trying to load compiled regex:

# /usr/local/bin/spamassassin --lint --debug
...
[29188] dbg: zoom: loading compiled ruleset from
/var/lib/spamassassin/compiled/3.002004
Segmentation fault
#

Are these perl version specific? I want to know this since I have spamd
running with the compiled rulesets, and don't want to mess up my current
config. If the compiled rulesets are perl specific, then I can go ahead
and recompile and restart spamd, otherwise, I think I found a bug, since
if there is a version mismatch, spamassassin should be able to detect
this, and in this case, go on without the compiled rulesets.

-- 
Jorge Valdes
Intercom El Salvador
[EMAIL PROTECTED]




Re: Feedback on 3.2.4

2008-01-23 Thread Jorge Valdes

Rick Macdougall wrote:

Skip wrote:
Other than the initial reports of performance boost from 3.2.4, I 
haven't
seen much discussion on it as yet.  Perhaps it is still too soon to 
know,

but has anyone been seeing other benefits - or identified potential
problems?



No problems with it at all here (around 7 servers upgraded) and the 
performance is greatly increased.  I went from a 1.4 second average 
scan time to 0.6 seconds average.


HTH,

Rick



Is this without network tests?
Because on my server I had

Begin   : 2008-01-01
End : 2008-01-15
Summary : 3.1.8

 Cnt%% Average  MinMax
-- -- -- -- --
18968  46.2%  7.837  1.861 10.000
16640  40.6% 13.654 10.001 19.999
 2916   7.1% 23.892 20.003 30.000
 1379   3.4% 38.132 30.002 59.882
  184   0.4% 74.994 60.041 89.753
   37   0.1% 99.552 90.282118.884
  904   2.2%154.578120.272364.923

Begin   : 2008-01-21
End : 2008-01-24
Summary : version 3.2.4

 Cnt%% Average  MinMax
-- -- -- -- --
 5302  44.9%  7.431  3.872 10.000
 4737  40.1% 13.643 10.002 19.998
  869   7.4% 24.003 20.008 29.982
  555   4.7% 41.017 30.001 59.947
  126   1.1% 72.529 60.201 89.941
   24   0.2%101.170 90.641118.022
  201   1.7%154.700120.454188.119

Because by just the percentages scantime is roughly the same with 
exactly the same hardware.



--
Jorge Valdes




Re: Fwd: FuzzyOcr - how do I teach it?

2007-02-23 Thread Jorge Valdes

Brian Wilson wrote:

On Feb 20, 2007, at 6:36 PM, Robert S wrote:


I have just installed FOCR 3.5.1 with the hashdb option.  I have been
receiving image spams about China Fruits Corporation which are
cleverly designed not to contain words in the words list.  How do I
insert the hash into the database and label this image as spam?

I have tried - unsuccessfully:

fuzzy-find --score=10 --learn-spam --verbose
367563:437:282:32::49:1:18:17:55642::44:40:7:37:54950::218:144:172:169:1131::96:99:179:107:1094::100:122:122:115:1093::156:136:162:145:1066 


(I got the hash score from running spamassassin -D  message)

and

fuzzy-find  --score=10 --learn-spam 'notary_public.gif'

I'd like to avoid tampering with the words list to avoid FPs.

Could somebody please tell me where I'm going wrong.

It would be nice if images could be automatically stored in the hashdb
as spam if SA gives them a positive score, but FOCR does not.



I have the same problem as you, so you are not alone.  I first deleted 
the hash using fuzzy-find to make sure it didn't exist in either hash, 
then added it with a score of 10.  I re-ran spamassassin with debug on 
for FuzzyOcr and it did not see the entry in the spam db.  I even 
compared the hashes and they were the same:


% fuzzy-find --delete 
278502:292:319:128::203:248:219:231:26298::202:200:236:205:25148::247:249:185:241:16996::192:236:242:224:16482::136:34:15:62:630::108:30:158:68:410 


Img =278502 292x319x128

% fuzzy-find --learn-spam --score=10 
278502:292:319:128::203:248:219:231:26298::202:200:236:205:25148::247:249:185:241:16996::192:236:242:224:16482::136:34:15:62:630::108:30:158:68:410 


Img =278502 292x319x128

Rerun the spam  through SA (China Fruits also: http://bubba.org/spam/)

Adding key to database...
[1548] dbg: FuzzyOcr: Not enough OCR Hits without space stripping, 
doing second matching pass...

[1548] info: FuzzyOcr: Message is ham, saving...
[1548] info: FuzzyOcr: Adding Hash to 
/etc/mail/spamassassin/FuzzyOcr.safe.db with score 0
[1548] dbg: FuzzyOcr: Digest: 
278502:292:319:128::203:248:219:231:26298::202:200:236:205:25148::247:249:185:241:16996::192:236:242:224:16482::136:34:15:62:630::108:30:158:68:410 





Remember that in order for things to work right, the safe database is 
checked first.  The rationale behind this is that if an image 
fingerprint is found here, there is no need to do OCR.  If you already 
have the image learned as HAM, you must delete it first, then optionally 
add it to the SPAM database.


Jorge.

--
-BEGIN GEEK CODE BLOCK-
Name: Jorge Valdes
EMail: jorgeatjoval.info
Version: 3.12
GED/J d+(-) s:+ a+ C++ ULS$ P$ L++ E--- W+++ N+ 
o? K- w+  M-@ V+ PS- PE+ Y? PGP-@ t++ 5@ X++ R tv+ b+ DI

D? G e++ h r+++ y+++
-END GEEK CODE BLOCK-



Re: lint errors

2007-01-22 Thread Jorge Valdes

Robert Fitzpatrick wrote:

I get the following lint errors:

esmtp# spamassassin --lint
Subroutine FuzzyOcr::O_NONBLOCK redefined at 
/usr/local/lib/perl5/5.8.6/Exporter.pm line 65.
 at /usr/local/lib/perl5/5.8.6/mach/POSIX.pm line 19
[98248] warn: FuzzyOcr: Cannot find executable for pamthreshold
[98248] warn: FuzzyOcr: Cannot find executable for tesseract

I found this regarding the first one, sounds like it can be ignored? Not
sure about the other two.

http://www.nabble.com/lint-error-on-FuzzyOcr-3.5.0-rc1-t2906332.html

  
The other two are warnings from FuzzyOcr that it could not find the 
executables for those programs.  You could ignore them and you should 
still be fine, as long as you still have a scanner available (ocrad|gocr).


Jorge.


Re: spammers dodging OCR

2006-11-06 Thread Jorge Valdes

Gary V wrote:
This morning I received my copy of networkworld. Here is an 
interesting article:


http://www.networkworld.com/columnists/2006/103006buzz-spammers-dodging-ocr.html 



Gary V

_
Add a Yahoo! contact to Windows Live Messenger for a chance to win a 
free trip! 
http://www.imagine-windowslive.com/minisites/yahoo/default.aspx?locale=en-ushmtagline 





FuzzyOcr (devel version) is already catching these... has been for a 
while now.


--
Jorge Valdes




Re: ImageInfo vs FuzzyOCR performance?

2006-10-30 Thread Jorge Valdes

Michael Scheidell wrote:

-Original Message-
From: Jorge Valdes [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 27, 2006 5:12 PM

To: users@spamassassin.apache.org
Subject: Re: ImageInfo vs FuzzyOCR performance?

 SPAM Results:
   3936 Message(s) 49.83%
 19.399 Average Score
 
   3343 Time(s)7.50%   84.93% Hit Rule: BAYES_99

   3068 Time(s)6.88%   77.95% Hit Rule: HTML_MESSAGE
   1655 Time(s)3.71%   42.05% Hit Rule: FUZZY_OCR
   1527 Time(s)3.42%   38.80% Hit Rule: SARE_GIF_ATTACH
   1411 Time(s)3.16%   35.85% Hit Rule: URIBL_BLACK
   1274 Time(s)2.86%   32.37% Hit Rule: URIBL_BLACK_OVERLAP
   1271 Time(s)2.85%   32.29% Hit Rule: MIME_HTML_ONLY
   1215 Time(s)2.72%   30.87% Hit Rule: URIBL_JP_SURBL
   1187 Time(s)2.66%   30.16% Hit Rule: RCVD_IN_BL_SPAMCOP_NET
   1184 Time(s)2.66%   30.08% Hit Rule: SARE_GIF_STOX



What do you use to get those stats?


  

This is from a custom logwatch script that runs every morning...

http://www.joval.info/scripts/spamd

Jorge

--
Jorge Valdes
Intercom El Salvador
[EMAIL PROTECTED]
voz: ++(503) 2278-5068
fax: ++(503) 2265-7025



Re: ImageInfo vs FuzzyOCR performance?

2006-10-27 Thread Jorge Valdes

Jeff Chan wrote:

Does anyone have any recent feedback about the performance of
ImageInfo versus FuzzyOCR about detecting stock image spams (or
any others)?  Does FuzzyOCR catch significantly more spams than
ImageInfo?

Cheers,

Jeff C.
  
I maybe biased, as I help in FuzzyOcr development, but do use both.  
ImageInfo is fine and will get you part of the way there, but FuzzyOcr 
hits more often. Daily scanning ~8Kmsg/day, FuzzyOcr hits ~1600 times 
and ImageInfo hits  150 times on average. On my system, here are the 
top10 rule hits from yesterday:


SPAM Results:
  3936 Message(s) 49.83%
19.399 Average Score

  3343 Time(s)7.50%   84.93% Hit Rule: BAYES_99
  3068 Time(s)6.88%   77.95% Hit Rule: HTML_MESSAGE
  1655 Time(s)3.71%   42.05% Hit Rule: FUZZY_OCR
  1527 Time(s)3.42%   38.80% Hit Rule: SARE_GIF_ATTACH
  1411 Time(s)3.16%   35.85% Hit Rule: URIBL_BLACK
  1274 Time(s)2.86%   32.37% Hit Rule: URIBL_BLACK_OVERLAP
  1271 Time(s)2.85%   32.29% Hit Rule: MIME_HTML_ONLY
  1215 Time(s)2.72%   30.87% Hit Rule: URIBL_JP_SURBL
  1187 Time(s)2.66%   30.16% Hit Rule: RCVD_IN_BL_SPAMCOP_NET
  1184 Time(s)2.66%   30.08% Hit Rule: SARE_GIF_STOX


Jorge Valdes




Re: Stock spam in images

2006-10-04 Thread Jorge Valdes

Jason Haar wrote:

I'm having marvelous luck with FuzzyOCR - but the spammers are learning too.

When I first started using it just a couple of months ago, it really
whacked the image-based spam. You could see why when gocr file.gif
returned nice text that was easy to match against.

However, now is a different matter. I just got a lose weight spam 10
minutes ago that gocr returns as:

  lI__c_tc)r _rc_hc_rihc_Ll _cnLl .h1c_Llic_;cll_ _u__c_c __ihc LI
  l c htc)hlc_rc)c_c_ B llr_ll l hc r_cp_


_ t4 __cc_'un ic) __'ri_c _ hH3s, t_k   _ ,r o_E,y _h K E,_
_ ,_ics r _ sncu)._r. t.ihk). lhirkrr x_))  '   gg __, r
_ Krvc)_H t)r r_irk cct .__ _
 O _' Y O ___ TE_ E
 _Lncl nLnn __ mc)R hnrtb

That tells me to go to www.realhgh dot org , but their GIF processing
munged it enough to slip by gocr

Not much FuzzyOCR can do with that :-(

  
A few days ago, someone provided me with an image that returned garbage 
when using plain 'gocr file'.  The trick to better detection is to 
adjust gocr's -l parameter to get better contrast (and better results).  
By looping 0...255 you will find a setting which will give you good 
results for this type of image, and if you start getting a lot of these 
images, adding another scanset will not add too many cpu cycles to your 
scan.  This new setting will almost certainly give you better results 
with other images too, so unless you have a really overloaded system, 
adding another scanset will not 'break the bank'.


--
Jorge Valdes




Re: Infuriating gif spam...

2006-09-26 Thread Jorge Valdes

Steve [Spamassasin] wrote:

I've been getting a _lot_ of spam recently which has been defeating my
spamassassin configuration - all of it has the same general form... A
message with auto-generated prose and an image.  I installed FuzzyOCR
and this helped, but one particular variant still slips through.

The problematic spams all embed a GIF image which confuses gocr (in
spite of being easily human-readable) - though I'm not sure why.  Three
images which defeat FuzzyOCR for me are:

http://temporary.shic.dynalias.net/Evil_Spam_Samples.zip

I would like to know if there is a straightforward way either (a) to
configure FuzzyOCR to decode the text, or (b), assuming that is hard, a
way to identify this kind of 'strange' GIF and apply a static score to
them (at least as a temporary measure?)

Thanks in advance for any pointers...
  
There are multiple images in these gifs, and because the first image is 
'junk', sending this image through gocr will yield no results. The 
problem is that you have to scan all images to find the text.  Try this 
with each image:


convert -append News.gif pnm:- | gocr -

I have an updated version of the FuzzyOcr plugin that has this and other 
improvements available here:


http://www.joval.info/proj/FuzzyOcr.html

--
Jorge Valdes
Intercom El Salvador
[EMAIL PROTECTED]




Hidden Option?

2006-06-28 Thread Jorge Valdes

Hi,

just wanted to let everyone know that I found a SPAMD option that cannot 
be configured via commandline: server-scale-period


By looking at the documentation, this option sets how much time the 
system will wait before determining whether a new child is spawned, the 
current default is 2 seconds.  In my case, I wanted to wait longer in 
order not to spawn a child only to be killed a couple of seconds later 
when the min-spare children became available again. I found out that the 
only way to change this was in the spamd script. I added the option 
manually ~ line 195.


'server-scale-period=i'= \$opt{'server-scale-period'},

Now I can set the option to my taste.

--
Jorge Valdes
Intercom El Salvador
[EMAIL PROTECTED]



Error when starting spamd 3.1.3

2006-06-28 Thread Jorge Valdes

Hi,

I get the following error when starting spamd:

error: Insecure dependency in `` while running with -T switch at 
/usr/local/lib/perl5/site_perl/5.8.6/Sys/Hostname/Long.pm line 91, 
GEN11 line 222.


System:
Solaris 9/sparc
Perl 5.8.6

This does not affect general operation, but it is anoying to see 
everytime I restart spamd due to option changes and/or configuration 
changes.


--
Jorge Valdes
Intercom El Salvador
[EMAIL PROTECTED]




Re: Error when starting spamd 3.1.3

2006-06-28 Thread Jorge Valdes

Rosenbaum, Larry M. wrote:

From: Jorge Valdes [mailto:[EMAIL PROTECTED]

Hi,

I get the following error when starting spamd:

error: Insecure dependency in `` while running with -T switch at
/usr/local/lib/perl5/site_perl/5.8.6/Sys/Hostname/Long.pm line 91,
GEN11 line 222.

System:
 Solaris 9/sparc
 Perl 5.8.6

This does not affect general operation, but it is anoying to see
everytime I restart spamd due to option changes and/or configuration
changes.



Try editing Long.pm and replacing this line:

my $tmp = `hostname` . '.' . `domainname`;

with this:

my $tmp = `hostname`;
my $tmp2 = `domainname`;
$tmp .= .$tmp2;
  

Thanks, that did the trick!!

--
Jorge Valdes
[EMAIL PROTECTED]




Re: SA 3.04 and RHEL4, Net::DNS isn't working

2005-06-21 Thread Jorge Valdes

Steven Stern wrote:


On a brand new RHEL4 installation, I've having problems with Net::DNS:

debug: is Net::DNS::Resolver available? yes
debug: Net::DNS version: 0.51
debug: trying (3) apache.org...
debug: looking up NS for 'apache.org'
debug: NS lookup of apache.org failed horribly = Perhaps your 
resolv.conf isn't pointing at a valid server?
debug: All NS queries failed = DNS unavailable (set dns_available to 
override)

debug: is DNS available? 0



Dig is able to find apache.org.  I've seen some posts on downgrading 
Net::DNS, but I can't find explicit instructions on how to do it.


I installed it via CPAN inside perl.



Steven,

I use a local DNS cache on my machine, and this for some reason is 
confusing the tests. When I configure the server to use real DNS 
servers, that test passes without problems, so I thought it's just a 
problem on how the test was designed. I force installed the upgrade 
and added the following in my local.cf:


#
## Force DNS
##
dns_available yes

Bingo... that did the trick, and now DNS checks are enabled and have not 
had problems with my setup.  I even changed the configuracion back to 
use my local DNS cache and still have not seen problems...


Hope it helps.

--
Jorge Valdes
Intercom El Salvador
[EMAIL PROTECTED]