Re: Applying a patch to Util.pm

2005-04-08 Thread mewolf1
In an older episode (Friday 08 April 2005 20:23), Stuart Johnston wrote:
> Theo Van Dinter wrote:
> > On Fri, Apr 08, 2005 at 06:28:25PM +0200, [EMAIL PROTECTED] wrote:
> > 
> >>>-  $uri =~ s,:\d+$,,gs; # port
> >>>+  $uri =~ s,:\d*$,,gs; # port
> >>
> >>How exactly should i apply the patch?
> > 
> > 
> > Since it's a 1 character change, you can just edit the file manually. :)

Good point, thanks!
Done.

> > 
> 
> While you're in there, you might want to change the line above it as 
> well (if you haven't already).  Bug #4213
> 
> -  $uri =~ s,[/\?\&].*$,,gs;# path/cgi params
> +  $uri =~ s,[/\?].*$,,gs;  # path/cgi params

Done. Thanks to you, too.



Re: Tables obscuring words

2005-04-08 Thread Jesse Houwing

See yesterday's thread "Re: Extra Sare Rules for meds?"
Jesse Houwing posted a beta-grade rule for this:
BODY TABLEOBFU
m{]+|"[^"]+)>(<([^>]+|"[^"]+)>)*[a-z]{1,2}(<([^>]+|"[^"]+)>)*]+|"[^"]+)>}i 

 

Argh.  I hate when I do that.  Looks like I just stopped reading that 
thread too early.  This apparently doesn't hit on my sample yet, but 
with tweakage it might. Working on that now.
I must have been half asleep yesterday ;)
It took me a bit of basic tweaking, but this one seems to do it's jon. I 
have yet to finish my masscheck, but let's see how this does:

RAWBODY TABLEOBFU 
/]|"[^"]*"|'[^']*')*>(<([^>]|"[^"]*"|'[^']*')*>)*[a-z]{1,2}(<([^>]|"[^"]*"|'[^']*')*>)*<\/td([^>]|"[^"]*"|'[^']*')*>/i
SCORE TABLEOBFU 2

Jesse



Anyone seen this?

2005-04-08 Thread Carnegie, Martin
Title: Anyone seen this?






We could only hope for more of this


http://abcnews.go.com/Technology/wireStory?id=653257





Re: URIDNSBL problem

2005-04-08 Thread Craig Baird
Quoting Matt Kettler <[EMAIL PROTECTED]>:

> Craig. One thing that REALLY jumps out at me is that there's no mention
> of init.pre by the rulefile parsing debug output.

And you would, of course, be absolutely correct.  That was the problem.  
My /etc/mail/spamassassin directory is NFS mounted read-only, so when I 
ran 'make install', it obviously wasn't able to copy that file over.

Thank you very much for your help, Matt  I've been beating my head on this one 
for the last couple of hours.  My upgrade from 2.64 has been somewhat less 
than smooth, but I think I'm finally there.

Craig


Re: URIDNSBL problem

2005-04-08 Thread Matt Kettler
Craig Baird wrote:

>below.  25_uribl.cf has not been changed from defaults.  Can anyone see why my 
>URIDNSBL tests are not firing?
>

Craig. One thing that REALLY jumps out at me is that there's no mention
of init.pre by the rulefile parsing debug output.

That would be a bad thing. It should be in your /etc/mail/spamassassin
and it should be the very first thing parsed.

Init.pre is responsible for loading the URIBL plugin, and without it,
you're not going to get any URIBL tests at all.


URIDNSBL problem

2005-04-08 Thread Craig Baird
Well, now that my Net::DNS issues are fixed, my DNS blacklist tests are now 
working, but SURBLs are not.  I'm running the latest Net::DNS, and network 
tests are working.  I inserted the SURBL test point URL into sample-spam.txt, 
and I've pasted the output of:

spamassasssin -D < sample-spam.txt

below.  25_uribl.cf has not been changed from defaults.  Can anyone see why my 
URIDNSBL tests are not firing?

Thanks!

Craig


debug: SpamAssassin version 3.0.2
debug: Score set 0 chosen.
debug: running in taint mode? yes
debug: Running in taint mode, removing unsafe env vars, and resetting PATH
debug: PATH included '/usr/local/sbin', keeping.
debug: PATH included '/usr/local/bin', keeping.
debug: PATH included '/usr/sbin', keeping.
debug: PATH included '/usr/bin', keeping.
debug: PATH included '/sbin', keeping.
debug: PATH included '/bin', keeping.
debug: PATH included '/usr/bin/X11', which doesn't exist, dropping.
debug: Final PATH set 
to: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
debug: using "/usr/local/share/spamassassin" for default rules dir
debug: config: read file /usr/local/share/spamassassin/10_misc.cf
debug: config: read file /usr/local/share/spamassassin/20_anti_ratware.cf
debug: config: read file /usr/local/share/spamassassin/20_body_tests.cf
debug: config: read file /usr/local/share/spamassassin/20_compensate.cf
debug: config: read file /usr/local/share/spamassassin/20_dnsbl_tests.cf
debug: config: read file /usr/local/share/spamassassin/20_drugs.cf
debug: config: read file /usr/local/share/spamassassin/20_fake_helo_tests.cf
debug: config: read file /usr/local/share/spamassassin/20_head_tests.cf
debug: config: read file /usr/local/share/spamassassin/20_html_tests.cf
debug: config: read file /usr/local/share/spamassassin/20_meta_tests.cf
debug: config: read file /usr/local/share/spamassassin/20_phrases.cf
debug: config: read file /usr/local/share/spamassassin/20_porn.cf
debug: config: read file /usr/local/share/spamassassin/20_ratware.cf
debug: config: read file /usr/local/share/spamassassin/20_uri_tests.cf
debug: config: read file /usr/local/share/spamassassin/23_bayes.cf
debug: config: read file /usr/local/share/spamassassin/25_body_tests_es.cf
debug: config: read file /usr/local/share/spamassassin/25_hashcash.cf
debug: config: read file /usr/local/share/spamassassin/25_spf.cf
debug: config: read file /usr/local/share/spamassassin/25_uribl.cf
debug: config: read file /usr/local/share/spamassassin/30_text_de.cf
debug: config: read file /usr/local/share/spamassassin/30_text_fr.cf
debug: config: read file /usr/local/share/spamassassin/30_text_nl.cf
debug: config: read file /usr/local/share/spamassassin/30_text_pl.cf
debug: config: read file /usr/local/share/spamassassin/50_scores.cf
debug: config: read file /usr/local/share/spamassassin/60_whitelist.cf
debug: using "/etc/mail/spamassassin" for site rules dir
debug: config: read file /etc/mail/spamassassin/70_sare_adult.cf
debug: config: read file /etc/mail/spamassassin/70_sare_bayes_poison_nxm.cf
debug: config: read file /etc/mail/spamassassin/70_sare_evilnum0.cf
debug: config: read file /etc/mail/spamassassin/70_sare_evilnum1.cf
debug: config: read file /etc/mail/spamassassin/70_sare_genlsubj0.cf
debug: config: read file /etc/mail/spamassassin/70_sare_header0.cf
debug: config: read file /etc/mail/spamassassin/70_sare_html0.cf
debug: config: read file /etc/mail/spamassassin/70_sare_html1.cf
debug: config: read file /etc/mail/spamassassin/70_sare_oem.cf
debug: config: read file /etc/mail/spamassassin/70_sare_specific.cf
debug: config: read file /etc/mail/spamassassin/70_sare_spoof.cf
debug: config: read file /etc/mail/spamassassin/70_sare_unsub.cf
debug: config: read file /etc/mail/spamassassin/70_sare_uri0.cf
debug: config: read file /etc/mail/spamassassin/70_sare_uri1.cf
debug: config: read file /etc/mail/spamassassin/72_sare_bml_post25x.cf
debug: config: read file /etc/mail/spamassassin/72_sare_redirect_post3.0.0.cf
debug: config: read file /etc/mail/spamassassin/88_FVGT_body.cf
debug: config: read file /etc/mail/spamassassin/88_FVGT_headers.cf
debug: config: read file /etc/mail/spamassassin/88_FVGT_rawbody.cf
debug: config: read file /etc/mail/spamassassin/88_FVGT_subject.cf
debug: config: read file /etc/mail/spamassassin/88_FVGT_uri.cf
debug: config: read file /etc/mail/spamassassin/99_FVGT_Tripwire.cf
debug: config: read file /etc/mail/spamassassin/99_FVGT_meta.cf
debug: config: read file /etc/mail/spamassassin/99_OBFU_drugs.cf
debug: config: read file /etc/mail/spamassassin/99_sare_fraud_post25x.cf
debug: config: read file /etc/mail/spamassassin/antidrug.cf
debug: config: read file /etc/mail/spamassassin/backhair.cf
debug: config: read file /etc/mail/spamassassin/bogus-virus-warnings.cf
debug: config: read file /etc/mail/spamassassin/chickenpox.cf
debug: config: read file /etc/mail/spamassassin/local.cf
debug: config: read file /etc/mail/spamassassin/nov2rules.cf
debug: using "/root/.spamassassin" for user state dir
debug

Re: Net::DNS trouble

2005-04-08 Thread Craig Baird
Quoting Chris Thielen <[EMAIL PROTECTED]>:

> If this is another debian box, I recommend sticking with debian packages 
> for everything.  Use CPAN to remove the package, then install it via 
> apt-get.
> 
> ii  libnet-dns-perl  
> 0.48-1   Perform DNS queries from a Perl script
> 
> If it claims you have this package installed, try "apt-get install 
> --reinstall libnet-dns-perl"


Thanks Chris.  This actually is a debian box.  I tried removing the CPAN 
package, and installing the Debian one.  No luck.  Still had the same 
problem.  However, after looking around a bit, I found that I apparently had 
an old version of Net::DNS (0.23) hanging around in another directory.  This 
old version seems to have come from razor-agents-sdk.   I manually deleted it, 
and Spamassassin now sees Net::DNS 0.49, so I think I'm good to go.  Thanks to 
you and Jeff for your help.

Craig


Re: Applying a patch to Util.pm

2005-04-08 Thread Stuart Johnston
Theo Van Dinter wrote:
On Fri, Apr 08, 2005 at 06:28:25PM +0200, [EMAIL PROTECTED] wrote:
-  $uri =~ s,:\d+$,,gs; # port
+  $uri =~ s,:\d*$,,gs; # port
How exactly should i apply the patch?

Since it's a 1 character change, you can just edit the file manually. :)
While you're in there, you might want to change the line above it as 
well (if you haven't already).  Bug #4213

-  $uri =~ s,[/\?\&].*$,,gs;# path/cgi params
+  $uri =~ s,[/\?].*$,,gs;  # path/cgi params


Re: Applying a patch to Util.pm

2005-04-08 Thread Theo Van Dinter
On Fri, Apr 08, 2005 at 06:28:25PM +0200, [EMAIL PROTECTED] wrote:
> > -  $uri =~ s,:\d+$,,gs; # port
> > +  $uri =~ s,:\d*$,,gs; # port
> 
> How exactly should i apply the patch?

Since it's a 1 character change, you can just edit the file manually. :)

-- 
Randomly Generated Tagline:
bpi: Gambling term, as in, "You bet your bpi."


pgp2R2MgFw0p3.pgp
Description: PGP signature


Re: --username flag

2005-04-08 Thread Chip
Matt Kettler wrote:
At 07:43 AM 4/8/2005, Peter Marshall wrote:
Is the --username flag (with sa-learn) the same as running sa-learn 
with that user ?

Looking at the code to 3.0.2's sa-learn, I can't see what the 
--username flag does at all. The parameter is present, but I can't see 
it being used anywhere or being passed to anything.

Some users have reported that it doesn't wind up changing the path to 
the selected username. Thus, I wonder if this feature works at all. 
I've not installed 3.0.2 myself, so I'm only speculating and echoing 
the reports of others.
The flag is only used when you use mysql or pgsql as the bayes storage 
driver, otherwise it is ignored.


Applying a patch to Util.pm

2005-04-08 Thread mewolf1
In SpamAssassin version 3.0.2 running on Perl version 5.8.4 I have tried to 
apply the patch suggested in
http://bugzilla.spamassassin.org/show_bug.cgi?id=4191
but it does not work as expected:
> The fix is trival.  Apply the following patch to lib/SpamAssassin/Util.pm:
> 
> --- Util.pm.origMon Mar 14 10:38:59 2005
> +++ Util.pm Mon Mar 14 10:39:12 2005
> @@ -788,7 +788,7 @@
>$uri =~ s#^[a-z]+:/{0,2}##gsi;   # drop the protocol
>$uri =~ s,^[^/]*\@,,gs;  # username/passwd
>$uri =~ s,[/\?\&].*$,,gs;# path/cgi params
> -  $uri =~ s,:\d+$,,gs; # port
> +  $uri =~ s,:\d*$,,gs; # port
> 
>return if $uri =~ /\%/; # skip undecoded URIs.
># we'll see the decoded version as well

I copied that from my browser to a file and ran
patch -p0 < /root/Util.pm.patch in
/usr/share/perl5/Mail/SpamAssassin.
Error:
patching file Util.pm
Hunk #1 FAILED at 788.
1 out of 1 hunk FAILED -- saving rejects to file Util.pm.rej

# cat Util.pm.rej 
***
*** 788,794 
$uri =~ s#^[a-z]+:/{0,2}##gsi;   # drop the protocol
$uri =~ s,^[^/]*\@,,gs;  # username/passwd
$uri =~ s,[/\?\&].*$,,gs;# path/cgi params
-   $uri =~ s,:\d+$,,gs; # port
  
return if $uri =~ /\%/; # skip undecoded URIs.
# we'll see the decoded version as well
--- 788,794 
$uri =~ s#^[a-z]+:/{0,2}##gsi;   # drop the protocol
$uri =~ s,^[^/]*\@,,gs;  # username/passwd
$uri =~ s,[/\?\&].*$,,gs;# path/cgi params
+   $uri =~ s,:\d*$,,gs; # port
  
return if $uri =~ /\%/; # skip undecoded URIs.
# we'll see the decoded version as well

How exactly should i apply the patch?



Re: Net::DNS trouble

2005-04-08 Thread Chris Thielen
Hi Craig,
Craig Baird wrote:
Quoting Jeff Chan <[EMAIL PROTECTED]>:
 

The usual way problems like this happen is when upgrades are done
using different mechanisms, i.e. CPAN vs tarball vs Subversion,
etc.
The different upgrade mechanisms have different ways of keeping
track of versions, paths, etc. and if those methods are mixed
*for the same program* they can get confused.
One solution is to always use CPAN, always use tarballs, always
use subversion, etc.  I.e. pick one and stick with it.
   

However, I still don't know how to fix this problem.  As I mentioned, I 
installed Net::DNS using CPAN.  When that didn't work, I also tried re-
installilng using the tarball.  I tried tarballs for 0.49 and 0.48 with the 
same results.  Any suggestions?
 

If this is another debian box, I recommend sticking with debian packages 
for everything.  Use CPAN to remove the package, then install it via 
apt-get.

ii  libnet-dns-perl  
0.48-1   Perform DNS queries from a Perl script

If it claims you have this package installed, try "apt-get install 
--reinstall libnet-dns-perl"

HTH


signature.asc
Description: OpenPGP digital signature


Re: Net::DNS trouble

2005-04-08 Thread Craig Baird
Quoting Jeff Chan <[EMAIL PROTECTED]>:

> 
> The usual way problems like this happen is when upgrades are done
> using different mechanisms, i.e. CPAN vs tarball vs Subversion,
> etc.
> 
> The different upgrade mechanisms have different ways of keeping
> track of versions, paths, etc. and if those methods are mixed
> *for the same program* they can get confused.
> 
> One solution is to always use CPAN, always use tarballs, always
> use subversion, etc.  I.e. pick one and stick with it.

Hmmm... I see.  I did the upgrade via CPAN.  I can't remember for sure how I 
installed the previous version.  Anyway, knowing now that this problem can 
arise by mixing upgrade mechanisms, I'll try to stick with one method.

However, I still don't know how to fix this problem.  As I mentioned, I 
installed Net::DNS using CPAN.  When that didn't work, I also tried re-
installilng using the tarball.  I tried tarballs for 0.49 and 0.48 with the 
same results.  Any suggestions?

Craig


Re: RDJ and bogus virus warnings rule

2005-04-08 Thread Martin Hepworth
Tim
thanks for the response. I guess the issues I saw (and others on the 
list) were as a result of you using all your bandwidth down to the abusers.

Anyway ta for the ruleset. Maybe one day people will also update all 
their old MailScanner hosts (or configure then not to bounce) and you 
can take those rules out of the set ;-)

--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300
Tim Jackson wrote:
Martin Hepworth  solid-state-logic.com> writes:
 

Tim does seem to have quite a few problems with people getting to this 
on a regular basis. 

(Hello; I don't follow the SA list routinely at the moment purely due to volume
& time pressures)
I'm not sure I'm aware of "regular" problems. To put some certainty to the
speculation in this thread and various theories I've received, as of last week I
implemented a rate limit whereby each unique IP address may only do one HTTP GET
and one HTTP HEAD request on the ruleset once in any given 24 hour period.
(Actually, thanks to a helpful suggestion by Matthew Turnbull, it's slightly
less than 24 hours to take account of cronjobs not running at exactly the same
time each day, but the principle is the same.).
I'm sorry if it has caused any inconvenience to anyone but I have had to do this
to counter the abuse caused by a very small minority of users who eat my
bandwidth downloading the (large) ruleset very regularly, often without even
checking (via HTTP HEAD or conditional HTTP GET) whether it has changed. In the
(probably unlikely) event that anyone reading this is one of those idiots who
has a script that unconditionally downloads 100K off my site once a minute, 24
hours a day: stop being so downright selfish.
I appreciate that the "once a day per IP" is not a particularly good solution,
particularly for those behind a transparent proxy or on a large NAT network but
in the absence of any better ideas that's what I've done for now. If it is
causing anyone particular problems for any reason, please do contact me and I
will try to work something out. I'm not trying to stop anyone responsible having
reasonable access to the list.
For everyone else: check once a day or less and you shouldn't even notice the
restrictions as long as you have your own unique IP. If you're doing that and
still having problems, let me know. There's absolutely no need to check more
than once per day.
If you're in doubt about what's going on, visit the URL in a web browser from
the machine you are trying to download the rules from (use lynx/elinks or
whatever if you're on a command-line-administered server). You will get a clear
error message telling you why you can't access it: if you're requesting the page
too soon after your last request you will get a message similar to this:
"Sorry, someone (possibly you) using your IP address (XX.XX.XX.XX) performed a
successful GET request on this page not long ago. Please do not check this page
more frequently than once per day."
where XX.XX.XX.XX is the requesting IP seen from my end. This is a temporary,
dynamic block which will automatically clear 24 hours after your last
*successful* request (meaning that if, for example, you check every 6 hours, you
will not keep resetting the timer and thus be refused forever; approximately 1
in 4 checks will succeed).
There are a small number of IPs that I have permanently blacklisted for more
serious abuse. In the unlikely event that you're on that list, you will get a
different page entitled "BANNED FOR ABUSIVE BEHAVIOUR" explaining that you need
to contact me. In that case, you will not be able to access the list from the IP
in question unless I manually remove you. I should note that this blacklist is
not new; I have been blacklisting some IPs for a while.
Hope that clears it all up and again apologies for any inconvenience.
Tim

**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.   
**


Re: RDJ and bogus virus warnings rule

2005-04-08 Thread Tim Jackson
Martin Hepworth  solid-state-logic.com> writes:
 
> Tim does seem to have quite a few problems with people getting to this 
> on a regular basis. 

(Hello; I don't follow the SA list routinely at the moment purely due to volume
& time pressures)

I'm not sure I'm aware of "regular" problems. To put some certainty to the
speculation in this thread and various theories I've received, as of last week I
implemented a rate limit whereby each unique IP address may only do one HTTP GET
and one HTTP HEAD request on the ruleset once in any given 24 hour period.
(Actually, thanks to a helpful suggestion by Matthew Turnbull, it's slightly
less than 24 hours to take account of cronjobs not running at exactly the same
time each day, but the principle is the same.).

I'm sorry if it has caused any inconvenience to anyone but I have had to do this
to counter the abuse caused by a very small minority of users who eat my
bandwidth downloading the (large) ruleset very regularly, often without even
checking (via HTTP HEAD or conditional HTTP GET) whether it has changed. In the
(probably unlikely) event that anyone reading this is one of those idiots who
has a script that unconditionally downloads 100K off my site once a minute, 24
hours a day: stop being so downright selfish.

I appreciate that the "once a day per IP" is not a particularly good solution,
particularly for those behind a transparent proxy or on a large NAT network but
in the absence of any better ideas that's what I've done for now. If it is
causing anyone particular problems for any reason, please do contact me and I
will try to work something out. I'm not trying to stop anyone responsible having
reasonable access to the list.

For everyone else: check once a day or less and you shouldn't even notice the
restrictions as long as you have your own unique IP. If you're doing that and
still having problems, let me know. There's absolutely no need to check more
than once per day.

If you're in doubt about what's going on, visit the URL in a web browser from
the machine you are trying to download the rules from (use lynx/elinks or
whatever if you're on a command-line-administered server). You will get a clear
error message telling you why you can't access it: if you're requesting the page
too soon after your last request you will get a message similar to this:

"Sorry, someone (possibly you) using your IP address (XX.XX.XX.XX) performed a
successful GET request on this page not long ago. Please do not check this page
more frequently than once per day."

where XX.XX.XX.XX is the requesting IP seen from my end. This is a temporary,
dynamic block which will automatically clear 24 hours after your last
*successful* request (meaning that if, for example, you check every 6 hours, you
will not keep resetting the timer and thus be refused forever; approximately 1
in 4 checks will succeed).

There are a small number of IPs that I have permanently blacklisted for more
serious abuse. In the unlikely event that you're on that list, you will get a
different page entitled "BANNED FOR ABUSIVE BEHAVIOUR" explaining that you need
to contact me. In that case, you will not be able to access the list from the IP
in question unless I manually remove you. I should note that this blacklist is
not new; I have been blacklisting some IPs for a while.


Hope that clears it all up and again apologies for any inconvenience.


Tim




report_safe doesn't seem to work since FC3 upgrade

2005-04-08 Thread Chris Harvey

I upgraded to FC3 this last weekend and I just noticed today that the mail
in my junk folder are not encapsulated/wrapped like they were before.

I checked my config file and have:

required_hits 4.5
rewrite_header Subject **SPAM(_SCORE_)**
report_safe 1
use_bayes 1

So it seems ok, but it's definitely seeing spam, milter-tagging it and then
filing it without wrapping it anymore.

Anyone having the same issue?



Re: --username flag

2005-04-08 Thread Matt Kettler
At 07:43 AM 4/8/2005, Peter Marshall wrote:
Is the --username flag (with sa-learn) the same as running sa-learn with 
that user ?
Looking at the code to 3.0.2's sa-learn, I can't see what the --username 
flag does at all. The parameter is present, but I can't see it being used 
anywhere or being passed to anything.

Some users have reported that it doesn't wind up changing the path to the 
selected username. Thus, I wonder if this feature works at all. I've not 
installed 3.0.2 myself, so I'm only speculating and echoing the reports of 
others.



Re: Local scores

2005-04-08 Thread Matt Kettler
At 06:37 AM 4/8/2005, Jon Gerdes wrote:
Could someone please explain this for me:
Header from an e-mail I received:
¯-8<--
X-Spam-Status: No, score=4.6 required=5.6 tests=BAYES_40,DATE_IN_PAST_12_24,
RCVD_IN_BL_SPAMCOP_NET autolearn=no version=3.0.2

score RCVD_IN_BL_SPAMCOP_NET 5.0
¯-8<--

Yeah? what's to explain? Are you trying to figure out why the total is less 
than 5.0? BAYES_40 carries a negative score.

 5.0 + -1.096 + 0.703 = 4.607




Re: Extra Sare Rules for meds?

2005-04-08 Thread Keith Ivey
Chris Santerre wrote:
We often replace it with something like \w{0,15} or whatever. Helps the
code.
Ah, sorry.  I understand what you meant about '*' now.  I 
thought you were talking about '+' versus '*', but your 
observation would apply just as well to '+', which should be 
replaced by '{1,15}' or something.

I think these changes aren't so necessary if you're applying the 
quantifier to something that's unlikely to match a long string 
of characters.  Obviously '.*' is bad, but '\s*' isn't going to 
hurt much, and '[^>]*' probably isn't much of a problem in HTML. 
 Still, adding limits is easy enough, especially since spammers 
could construct messages with "unlikely" sequences.

--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC


RE: Extra Sare Rules for meds?

2005-04-08 Thread Chris Santerre


>-Original Message-
>From: Gray, Richard [mailto:[EMAIL PROTECTED]
>Sent: Friday, April 08, 2005 9:43 AM
>To: users@spamassassin.apache.org
>Subject: RE: Extra Sare Rules for meds?
>
>
>> 
>> One of the things the SARE group has realized, is that using 
>> '*' in any regex is a bad idea. Trust me on that one. We 
>> avoid it like the plague. 
>> 
>> --Chris 
>> 
>
>Are there any other rules of thumb such as this that would be really
>useful to know?
>
>Many thanks,
>
>Richard

We often replace it with something like \w{0,15} or whatever. Helps the
code. 

Also if you plan on writing a ruleset, you write it completely different for
testing as you would for final use. The testing is long an tedious. Coding
is somewhat different, as you want to see results for different instances of
a single rule. 

You WANT results from different people. I can't suggest that enough! One
person will show an S/O rating of .95, while another shows .42! 

I think everyone already knows that if you have a 'set' you precede it with
a '?:' as in

body stupidexample /(?:s|f|g|beg)un/i

Don't ask why, just do it ;) 

--Chris 


Re: Extra Sare Rules for meds?

2005-04-08 Thread Keith Ivey
Chris Santerre wrote:
m{"]+|"[^"]*")*>(<([^>"]+|"[^"]*")*>)*[a-z]{1,2}(<([^>"]
+|"[^"]*")*>)*"]+|"[^"]*")*>}i 

The other problem with the pattern as written (with no *) is 
that the subpatterns don't match plain  or , since they 
require at least one character between the td and the >.

One of the things the SARE group has realized, is that using '*' in any
regex is a bad idea. Trust me on that one. We avoid it like the plague. 
I'm sure that '*' causes problems in certain contexts, but a 
blanket prohibition on it seems excessive.  I know it's 
particular problematic when applied to something that can match 
an empty string, but that's not the case here (and the problem 
would apply just as much with '+').  Actually the real danger 
with '*' is probably having it in a context where another '*' or 
a '+' applies to it -- something like '([^"]*|"")*', which 
should be '([^"]+|"")*'.

The worst affect of avoiding '*' in this case is that the 
original regex contains ']+|"[^"]+)>', which doesn't 
match plain '', which is surely going to be the most common 
way to close a table cell.

In any case, as originally written the '|"[^"]+' part of the 
regex is useless unless spammers are really using things like

   
It doesn't match things like
   
which is apparently what was intended, otherwise there wouldn't 
be much point in not just leaving it at plain '[^>]+'.

--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC


RE: Extra Sare Rules for meds?

2005-04-08 Thread Gray, Richard
> 
> One of the things the SARE group has realized, is that using 
> '*' in any regex is a bad idea. Trust me on that one. We 
> avoid it like the plague. 
> 
> --Chris 
> 

Are there any other rules of thumb such as this that would be really
useful to know?

Many thanks,

Richard


---
This email from dns has been validated by dnsMSS Managed Email Security and is 
free from all known viruses.

For further information contact [EMAIL PROTECTED]






RE: Extra Sare Rules for meds?

2005-04-08 Thread Chris Santerre


>-Original Message-
>From: Keith Ivey [mailto:[EMAIL PROTECTED]
>Sent: Thursday, April 07, 2005 10:32 PM
>To: users@spamassassin.apache.org
>Cc: Jesse Houwing
>Subject: Re: Extra Sare Rules for meds?
>
>
>Jesse Houwing wrote:
>
>> BODY TABLEOBFU 
>> 
>m{]+|"[^"]+)>(<([^>]+|"[^"]+)>)*[a-z]{1,2}(<([^>]+|"[^"]
>+)>)*]+|"[^"]+)>}i 
>
>I think you may want a * after the ) inside the <>.  As it is, 
>you're looking for either a bunch of characters that are not > 
>or a quote followed by a bunch of characters that are not quote. 
>  In fact, I think what was really intended was something more 
>like this (note that this also requires an ending quote on 
>contained quoted strings and allows ""):
>
>m{"]+|"[^"]*")*>(<([^>"]+|"[^"]*")*>)*[a-z]{1,2}(<([^>"]
>+|"[^"]*")*>)*"]+|"[^"]*")*>}i 
>
>
>The other problem with the pattern as written (with no *) is 
>that the subpatterns don't match plain  or , since they 
>require at least one character between the td and the >.
>

One of the things the SARE group has realized, is that using '*' in any
regex is a bad idea. Trust me on that one. We avoid it like the plague. 

--Chris 


RE: bogus-virus-warnings-cf

2005-04-08 Thread Chris Santerre


>-Original Message-
>From: Nick Leverton [mailto:[EMAIL PROTECTED]
>Sent: Friday, April 08, 2005 8:23 AM
>To: users@spamassassin.apache.org
>Subject: Re: bogus-virus-warnings-cf
>
>
>On Sat, Apr 02, 2005 at 05:09:40PM -0600, Chris wrote:
>> I use RDJ to update rule sets, I only run it once a day.  On 
>the run for the 
>> 31st of March, RDJ reported:
>> 
>> RulesDuJour Run Summary on cpollock.localdomain:
>> 
>> The following rules had errors:
>> Tim Jackson's (et al) bogus virus warnings was not retrieved 
>because of: 403 
>> from http://www.timj.co.uk/linux/bogus-virus-warnings.cf.
>> 
>> clicking on the link and opening with Mozilla still shows a 
>403 - Permission 
>> Denied.  Anyone else having problems getting this update?
>
>Yes, I've been corresponding with Tim about it.  Due to abuse such as
>some people doing GETs as often as once per second, he's had to rate
>limit the server.  At present you should be able to do one HEAD and one
>GET per IP address per day.  He said he may remove the limit on HEADs
>when he sees how it all responds.
>
>Nick

Nick, ask him if he would like us to mirror it at SARE. We rate limit all of
the GETs. We even blacklist the fools that try to update every minute. 

Chris Santerre 
System Admin and SARE Ninja
http://www.rulesemporium.com 


auto_learn and use_bayes

2005-04-08 Thread Peter Marshall
When you use auto_learn it updates the bayes database right ?  If you 
have this in the main local.cf file, does it update the system wide 
bayes database or the local users ?  does "use_bayes" turn on system 
wide or individual ?

I guess, I would like to know how I know which is being used ?
How do I know it is running by each individual user (is this only 
through running sa-learn as that user).

How do I know it is running system wide ?  (is that through use_bayes 
and auto_learn).

If I run as a per user basis, do I need "use_bayes and auto_learn" in 
the local.cf ?  The man page says that auto_learn is good, but should be 
suplimented with sa-learn.  Just curious if auto_learn affecs all users 
bayes database, or the system one.

Sorry for all the questions this morning ... I am just a little confused ...
Peter


Re: bogus-virus-warnings-cf

2005-04-08 Thread Nick Leverton
On Sat, Apr 02, 2005 at 05:09:40PM -0600, Chris wrote:
> I use RDJ to update rule sets, I only run it once a day.  On the run for the 
> 31st of March, RDJ reported:
> 
> RulesDuJour Run Summary on cpollock.localdomain:
> 
> The following rules had errors:
> Tim Jackson's (et al) bogus virus warnings was not retrieved because of: 403 
> from http://www.timj.co.uk/linux/bogus-virus-warnings.cf.
> 
> clicking on the link and opening with Mozilla still shows a 403 - Permission 
> Denied.  Anyone else having problems getting this update?

Yes, I've been corresponding with Tim about it.  Due to abuse such as
some people doing GETs as often as once per second, he's had to rate
limit the server.  At present you should be able to do one HEAD and one
GET per IP address per day.  He said he may remove the limit on HEADs
when he sees how it all responds.

Nick


--username flag

2005-04-08 Thread Peter Marshall
Is the --username flag (with sa-learn) the same as running sa-learn with 
that user ?

ie.  as root
sa-learn --username bob 
sa-learn --username fred ...
Could you do this and obtain individual bayes databases the same way 
that you would if you ran individual crons for each user ?

Thanks,
Peter


Re: WHich is better

2005-04-08 Thread Peter Marshall
Hi Robert,
Thank you very much for your detailed reply.  It was very helpful.  I 
just have one question.  Why can you not run sa-learn on spam already 
flagged as spam.  I thought spamassassin would rip out any headers it 
already added.  If that is the case then what is the harm in re learning 
 the spam as spam ... (I am just asking .. not trying to argue ... just 
curious).

Thank you again for your help,
Peter
Robert Menschel wrote:
Hello Peter,
Thursday, April 7, 2005, 5:29:38 AM, you wrote:
PM> I have been building a new mailserver to replace my old one.
PM> The new one has postfix, Cyrus-imap, anomy, spamassassin.  I am trying
PM> to set up the bays auto-learn stuff.  Each user has a home directory on
PM> the server (they can not log onto the server).  I am using the Maildir
PM> format.
PM> Is it better to have a cron job run by a single user (say root) to do
PM> the ham / spam learning for everyone, or should I run a cron for each
PM> individual user.  All users belong to the same company.
Best, if you have the disk space for the multitude of Bayes databases,
is to run ham/spam learning as each user. I'd recommend the "running
constantly if I staggered it for every user," something like:
- run as cron
- get cycle start time
- identify list of active users
- for each active user
  - determine if anything to learn; skip to next user if not
  - su to that user's id
  - sa-learn
- if not yet 30 min since start of this cycle, sleep 15 min
- loop to next cycle.
PM> Problem I have thought of with the latter.
PM> 1.  There would be approximitly 130 cron jobs running sa-learn at the
PM> same time  or it would run constantly if I staggered it for every
PM> user.  What kind of load will that have on  my 850 with 756 MB of ram ?
running constantly, staggered, will work better on that system (IMO)
than allowing multiple executions at the same time.
PM> Problems I have with both:
PM> 1.  What is the best method of obtaining the spam / ham.  I have the
PM> server create a spam folder for each user when the user is created.
PM> spamassassin will automatically put all mail marked as spam in this
PM> folder.  Obviously I will use this folder to run salearn on for spam.
NO. NO. NO. NO.
Do not run sa-learn on automatically flagged emails. SA does this
itself somewhat conservatively (though not conservatively enough --
I suggest lowering the ham auto-learn threshold).
Provide instead a "missed-spam" folder and a "not-spam" folder. Have
your people copy/move miscategorized emails into those, and learn from
those folders.
PM> 2. How often should I run sa-learn ?  Users here for the most part get
PM> mail in their inbox and then after reading it move it to some other sub
PM> folder ... (of which everyones is different, and some have over 100).
On single-domain systems I normally run it hourly.
PM> Are there any downfalls to running a site wide one ?  What is the best
PM> method of doing this if this is a better method.  Currently I plan to
PM> use this to learn the spam.  Does anyone see any problems.
PM> (Note:  this assumes it is being run as a particular user.)
Some people prefer system-wide, others domain-wide, others
user-specific.  YMMV. Feasibility might be the more important
criteria, since all three can work.
Bob Menschel

--
Peter Marshall, BCS
System Administrator, CARIS
CARIS 2005 - Mapping a Seamless Society
10th International User Group Conference and Educational Sessions
Halifax, NS, Canada
E-mail [EMAIL PROTECTED] for more.


Re: Local scores

2005-04-08 Thread Jon Gerdes
Thanks for the v prompt replies. 

My scores are:

50_scores.cf:score BAYES_40 0 0 -0.276 -1.096
50_scores.cf:score DATE_IN_PAST_12_24 0.374 0 0.571 0.703

and

RCVD_IN_BL_SPAMCOP_NET 5.0

Then given this from docs:

If four valid scores are listed, then the score that is used depends on how 
SpamAssassin is being used. The first
   score is used when both Bayes and network tests are disabled (score 
set 0). The second score is used when Bayes is
   disabled, but network tests are enabled (score set 1). The third 
score is used when Bayes is enabled and network
   tests are disabled (score set 2). The fourth score is used when 
Bayes is enabled and network tests are enabled
   (score set 3)

I have Bayes enabled and Network test enabled (seeing as both types were scored 
- BAYES_40 and RCVD...), so score set 3 is the one

So I would expect 5.0 + -1.096 + 0.703 = 4.607 which is exactly what I got!!!  
Gosh it's all working just as it says on the tin 8)

Very sorry for wasting your time.

Cheers
Jon Gerdes

>>> "Loren Wilton" <[EMAIL PROTECTED]> 04/08/05 11:57am >>>
BAYES_40 will slightly lower the score.  However I'd think that
RCVD_IN_BL_SPAMCOP_NET should crank up the score by a good amount and more
than compensate for bayes.

Loren

- Original Message - 
From: "Jon Gerdes" <[EMAIL PROTECTED]>
To: 
Sent: Friday, April 08, 2005 3:37 AM
Subject: Local scores


> ***  Before acting on this email or opening any attachment you are advised
to read the disclaimer at the end of this email ***
>
> Could someone please explain this for me:
>
> Header from an e-mail I received:
> ?-8<--
> X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on
cesium.whl.co.uk
> X-Spam-Status: No, score=4.6 required=5.6
tests=BAYES_40,DATE_IN_PAST_12_24,
> RCVD_IN_BL_SPAMCOP_NET autolearn=no version=3.0.2
> X-SA-Exim-Version: 4.1 (built Mon, 01 Nov 2004 09:10:08 +)
> ?-8<--
>
> Note that the total score is 4.6
>
> Excerpt from my local.cf:
> ?-8<--
> score GTUBE120.0
> score RCVD_IN_BL_SPAMCOP_NET 5.0
> ?-8<--
>
> Note that I've scored up RCVD_IN_BL_SPAMCOP_NET to 5.0.
>
> I know that the other local scores work OK becuause I can sent GTUBE in
for a pretty large score
>
> Cheers
> Jon Gerdes
>
>
> *** Disclaimer ***
> The information contained in this E-Mail and any subsequent correspondence
may be subject to the Export Control Act (ECA) 2002. The content is private
and is intended solely for the recipient(s).
> For those other than the recipient any disclosure, copying, distribution,
or action taken, or omitted to be taken, in reliance on such information is
prohibited and may be unlawful.
>
> If received in error please return to sender immediately.
>
> Under the laws of England misuse of information that is subject to the ECA
2002, is a criminal offence.




Re: Local scores

2005-04-08 Thread Loren Wilton
BAYES_40 will slightly lower the score.  However I'd think that
RCVD_IN_BL_SPAMCOP_NET should crank up the score by a good amount and more
than compensate for bayes.

Loren

- Original Message - 
From: "Jon Gerdes" <[EMAIL PROTECTED]>
To: 
Sent: Friday, April 08, 2005 3:37 AM
Subject: Local scores


> ***  Before acting on this email or opening any attachment you are advised
to read the disclaimer at the end of this email ***
>
> Could someone please explain this for me:
>
> Header from an e-mail I received:
> ―-8<--
> X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on
cesium.whl.co.uk
> X-Spam-Status: No, score=4.6 required=5.6
tests=BAYES_40,DATE_IN_PAST_12_24,
> RCVD_IN_BL_SPAMCOP_NET autolearn=no version=3.0.2
> X-SA-Exim-Version: 4.1 (built Mon, 01 Nov 2004 09:10:08 +)
> ―-8<--
>
> Note that the total score is 4.6
>
> Excerpt from my local.cf:
> ―-8<--
> score GTUBE120.0
> score RCVD_IN_BL_SPAMCOP_NET 5.0
> ―-8<--
>
> Note that I've scored up RCVD_IN_BL_SPAMCOP_NET to 5.0.
>
> I know that the other local scores work OK becuause I can sent GTUBE in
for a pretty large score
>
> Cheers
> Jon Gerdes
>
>
> *** Disclaimer ***
> The information contained in this E-Mail and any subsequent correspondence
may be subject to the Export Control Act (ECA) 2002. The content is private
and is intended solely for the recipient(s).
> For those other than the recipient any disclosure, copying, distribution,
or action taken, or omitted to be taken, in reliance on such information is
prohibited and may be unlawful.
>
> If received in error please return to sender immediately.
>
> Under the laws of England misuse of information that is subject to the ECA
2002, is a criminal offence.



Local scores

2005-04-08 Thread Jon Gerdes
***  Before acting on this email or opening any attachment you are advised to 
read the disclaimer at the end of this email ***

Could someone please explain this for me:

Header from an e-mail I received:
―-8<--
X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on cesium.whl.co.uk
X-Spam-Status: No, score=4.6 required=5.6 tests=BAYES_40,DATE_IN_PAST_12_24,
RCVD_IN_BL_SPAMCOP_NET autolearn=no version=3.0.2
X-SA-Exim-Version: 4.1 (built Mon, 01 Nov 2004 09:10:08 +)
―-8<--

Note that the total score is 4.6

Excerpt from my local.cf:
―-8<--
score GTUBE120.0
score RCVD_IN_BL_SPAMCOP_NET 5.0
―-8<--

Note that I've scored up RCVD_IN_BL_SPAMCOP_NET to 5.0.  

I know that the other local scores work OK becuause I can sent GTUBE in for a 
pretty large score

Cheers
Jon Gerdes


*** Disclaimer ***
The information contained in this E-Mail and any subsequent correspondence may 
be subject to the Export Control Act (ECA) 2002. The content is private and is 
intended solely for the recipient(s). 
For those other than the recipient any disclosure, copying, distribution, or 
action taken, or omitted to be taken, in reliance on such information is 
prohibited and may be unlawful.

If received in error please return to sender immediately.

Under the laws of England misuse of information that is subject to the ECA 
2002, is a criminal offence.


SpamAssassin DNS/SURBL bug possibly discovered, fixed

2005-04-08 Thread Jeff Chan
See:

http://bugzilla.spamassassin.org/show_bug.cgi?id=4249
http://bugzilla.spamassassin.org/show_bug.cgi?id=3997

It would be interesting to see if folks could try the patches
added to the tickets, try a slow DNS resolution (longer than the
timeout of 3 seconds) and see if they can duplicate/eliminate the
error.  Also if the hypothesis on 4249 is correct, then the
problem of mixed up UDP packets may show up on a network
analyzer, tcpdump, etc. 

Jeff C.
--
"If it appears in hams, then don't list it."



Re: Net::DNS trouble

2005-04-08 Thread Jeff Chan
On Thursday, April 7, 2005, 4:50:00 PM, Craig Baird wrote:
> I just attempted an upgrade from SA 2.64 to 3.0.2, and am now having problems 
> with SURBLs and RBLs not working.  I upgraded all of the perl modules 
> mentioned in INSTALL to the latest versions prior to installing SA 3.0.2, 
> including Net::DNS, which is at version 0.49.  When I run:

> spamassassin -D --lint

> I get the following two messages relating to Net::DNS:

> debug: diag: module installed: Net::DNS, version (undef)
> debug: is Net::DNS::Resolver available? no
> debug: is DNS available? 0

> I assume this means that SpamAssassin can't figure out what version of 
> Net::DNS I'm running, and is therefore failing to use it.  I tried 
> downgrading 
> Net::DNS to version 0.48 with the same results.

> I have four SA servers, all with Debian Woody, and have tried to upgrade two 
> of them to 3.0.2.  This problem is happening on both of these machines.

The usual way problems like this happen is when upgrades are done
using different mechanisms, i.e. CPAN vs tarball vs Subversion,
etc.

The different upgrade mechanisms have different ways of keeping
track of versions, paths, etc. and if those methods are mixed
*for the same program* they can get confused.

One solution is to always use CPAN, always use tarballs, always
use subversion, etc.  I.e. pick one and stick with it.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



RE: SA 3.02 rewrite_mail problem

2005-04-08 Thread Steven Manross
Well, it seems I fixed my own problem...  The problem was corrected by
taking the array out of the equation and just sending the $message_txt
to the factory objects (I also made config changes to the params for the
$spamtest object, but didn't get it working until taking out the array
references (@array = split(...)) from the code below.

It seems odd that it would only mess up on the real mail, but it's fixed
now so I won't worry about such trivialities.  :)

Thanks for the continuing great work on the Module, Rulesets, and last
but not least, documentation.

Steven

-Original Message-
From: Steven Manross 
Sent: Monday, March 28, 2005 9:13 AM
To: [EMAIL PROTECTED]
Subject: SA 3.02 rewrite_mail problem


I'm having issues rewriting mail. 

In the case that spam has been found, the message rewrites fine.

In the case of real mail, I want the SA headers inserted, but it seems
that if I rewrite_mail on non-spam, the message loses all headers
(except SA headers) and improperly formats the message text (HTML
becomes text, with visible HTML tags).

I must say that I am very impressed with the status of 3.02 and am
hoping this is a config problem on my end, but can't seem to find my own
answer in my searches for it.

Upgrading from 2.60 was relatively painless and it looks to handle the
message parts better.  This code was working on 2.60 (with changes based
on version differences).

The workaround for now is NOT to change the $message_text with
rewrite_mail for non-spam, as doing so will make a messs of the mail.
GTUBE seem fine, and the nonspam (after) seems hosed.  Any help would be
appreciated. (example below)

I have added the "add_header all" config line in the local.cf for each
of the headers I want in each message.

Code snippet follows.

SA 3.02
ActivePerl 5.8 (Build 811)
W2K SP4
Exchange Server 2000 (all available SPs and Hotfixes)

my $spamtest = new Mail::SpamAssassin ({
  userprefs_filename   => 'X:/spam/assassin/prefs/user_prefs',
  local_tests_only => 1,
  username => 'someuser'
});

if (is_message_spam($message_text,$spamtest,$mailobj,$statusobj)) {
  #do stuff
} else {
  #do other stuff
}

exit 1;

sub is_message_spam {
  my $message_txt = $_[0];
  my $spamtest = $_[1];
  my @array = split(/\n/,$message_txt);
  my $mail = $spamtest->parse([EMAIL PROTECTED],1);
  my $status = $spamtest->check($mail);
  $mail = $status->get_message();
  $_[2] = $mail;
  $_[3] = $status;
  #$message_txt = $status->rewrite_mail();
  #$_[0] = $message_txt;
  if ($status->is_spam()) {
$message_txt = $status->rewrite_mail();
$_[0] = $message_txt;
return 1;
  } else {
return 0;
  }
}


NON-SPAM AFTER
--
X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on 
homeexch2.manross.net
X-Spam-Level: 
X-Spam-Status: No, score=0.4 required=5.0 
tests=ALL_TRUSTED,AWL,BAYES_60

autolearn=no version=3.0.2

This is a multi-part message in MIME format.

--Message-Boundary-19990614
Content-Type: text/plain;
charset="US-ASCII"
Content-Description: Mail message body
Content-Transfer-Encoding: 7bit

This is a message from Perl with an attachment.
--Message-Boundary-19990614
Content-Type: text/plain;
type=Unknown;
name="boot.ini"
Content-Description: boot.ini
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment

[boot
loader]timeout=3D30default=3Dmulti(0)disk(0)rdisk(0)partition(1)\WINN
T[o
perating systems]multi(0)disk(0)rdisk(0)partition(1)\WINNT=3D"My 
System"
--Message-Boundary-19990614--

NONSPAM BEFORE -- HEADERS
-
thread-index: AcUy9fmcNHhQyjavTVKbUKvxBooZ4g==
Received: from localhost ([x.x.x.x]) by .xxx.xxx with
Microsoft SMTPSVC(5.0.2195.6713); Sun, 27 Mar 2005 10:53:57 -0700
Content-Transfer-Encoding: 7bit
To: <[EMAIL PROTECTED]>
Content-Class: urn:content-classes:message
Importance: normal
Priority: normal
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1478
From: "ME" <[EMAIL PROTECTED]>
X-Mailer: Perl+Mail::Sender 0.7.08 by Jan Krynicky
Subject: Perl Mail with Attachment test
MIME-Version: 1.0
Content-Type: Multipart/Mixed;
boundary="Message-Boundary-19990614"
Return-Path: <[EMAIL PROTECTED]>
Message-ID: <[EMAIL PROTECTED]>
X-OriginalArrivalTime: 27 Mar 2005 17:53:57.0443 (UTC)
FILETIME=[F2F4C130:01C532F5]
Date: 27 Mar 2005 10:53:57 -0700


Re: Extra Sare Rules for meds?

2005-04-08 Thread Keith Ivey
Jesse Houwing wrote:
BODY TABLEOBFU 
m{]+|"[^"]+)>(<([^>]+|"[^"]+)>)*[a-z]{1,2}(<([^>]+|"[^"]+)>)*]+|"[^"]+)>}i 
I think you may want a * after the ) inside the <>.  As it is, 
you're looking for either a bunch of characters that are not > 
or a quote followed by a bunch of characters that are not quote. 
 In fact, I think what was really intended was something more 
like this (note that this also requires an ending quote on 
contained quoted strings and allows ""):

m{"]+|"[^"]*")*>(<([^>"]+|"[^"]*")*>)*[a-z]{1,2}(<([^>"]+|"[^"]*")*>)*"]+|"[^"]*")*>}i 

The other problem with the pattern as written (with no *) is 
that the subpatterns don't match plain  or , since they 
require at least one character between the td and the >.

--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC


Re: WHich is better

2005-04-08 Thread Robert Menschel
Hello Peter,

Thursday, April 7, 2005, 5:29:38 AM, you wrote:

PM> I have been building a new mailserver to replace my old one.
PM> The new one has postfix, Cyrus-imap, anomy, spamassassin.  I am trying
PM> to set up the bays auto-learn stuff.  Each user has a home directory on
PM> the server (they can not log onto the server).  I am using the Maildir
PM> format.

PM> Is it better to have a cron job run by a single user (say root) to do
PM> the ham / spam learning for everyone, or should I run a cron for each
PM> individual user.  All users belong to the same company.

Best, if you have the disk space for the multitude of Bayes databases,
is to run ham/spam learning as each user. I'd recommend the "running
constantly if I staggered it for every user," something like:
- run as cron
- get cycle start time
- identify list of active users
- for each active user
  - determine if anything to learn; skip to next user if not
  - su to that user's id
  - sa-learn
- if not yet 30 min since start of this cycle, sleep 15 min
- loop to next cycle.

PM> Problem I have thought of with the latter.
PM> 1.  There would be approximitly 130 cron jobs running sa-learn at the
PM> same time  or it would run constantly if I staggered it for every
PM> user.  What kind of load will that have on  my 850 with 756 MB of ram ?

running constantly, staggered, will work better on that system (IMO)
than allowing multiple executions at the same time.

PM> Problems I have with both:
PM> 1.  What is the best method of obtaining the spam / ham.  I have the
PM> server create a spam folder for each user when the user is created.
PM> spamassassin will automatically put all mail marked as spam in this
PM> folder.  Obviously I will use this folder to run salearn on for spam.

NO. NO. NO. NO.

Do not run sa-learn on automatically flagged emails. SA does this
itself somewhat conservatively (though not conservatively enough --
I suggest lowering the ham auto-learn threshold).

Provide instead a "missed-spam" folder and a "not-spam" folder. Have
your people copy/move miscategorized emails into those, and learn from
those folders.

PM> 2. How often should I run sa-learn ?  Users here for the most part get
PM> mail in their inbox and then after reading it move it to some other sub
PM> folder ... (of which everyones is different, and some have over 100).

On single-domain systems I normally run it hourly.

PM> Are there any downfalls to running a site wide one ?  What is the best
PM> method of doing this if this is a better method.  Currently I plan to
PM> use this to learn the spam.  Does anyone see any problems.
PM> (Note:  this assumes it is being run as a particular user.)

Some people prefer system-wide, others domain-wide, others
user-specific.  YMMV. Feasibility might be the more important
criteria, since all three can work.

Bob Menschel




BAYES...sitewide or per-user or not at all?

2005-04-08 Thread Gerald V. Livingston II
We are a small ISP. Our primary domain mail server currently has a few
over 6000 addresses and inbound volume is between 100K and 200K/day with
probably 95%+ of that being spam.

I'm going through the wiki now and should be able to have SA running on the
new test box soon with per-user preferences on virtual domains.

My question is, should I set up BAYES at all? I'm fairly certain domain
level BAYES would be a bad thing with our demographic. We have people with
family, friends, or business partners in APNIC countries, we have customers
who frequent spam havens (online porn gatherers), we have ultra religious
customers, and we have middle of the road customers.

I'm afraid domain wide bayes would show up as many FPs for the first two
groups or many FNs for the last two -- or the database would just stay
hosed up with customers shoving conflicting spam and ham into the learning
folders.

I'm not sure how resource efficient per-user BAYES would be. Will it kill
the machine as the user base grows or the spam volume increases?

For the first few weeks this system will be running everything on a single
machine. When it's operational and the customers have been moved from the
old server I will turn that server into a gateway box to handle the
scanning duties and leave the IMAP/WebMail and the MySQL database on the
main box.

Who's running a successful large user base/high volume site with per-user
BAYES?

Gerald



Re: RDJ and bogus virus warnings rule

2005-04-08 Thread Chris
On Thursday 07 April 2005 07:42 pm, you wrote:

> Chris, I suspect that your address is blacklisted by timj.co.uk for some
> reason.
>
> Is there any chance you were doing something which might be regarded as
> abusive, such as updating via rdj every 5 minutes?
>
> Any chance he might have blacklisted your whole IP block because a virus
> infected host was attacking his website?
>
> The reason I suspect blacklisting is I can wget the file just fine:
>
>
> $ wget http://www.timj.co.uk/linux/bogus-virus-warnings.cf
> --20:38:14--  http://www.timj.co.uk/linux/bogus-virus-warnings.cf
>=> `bogus-virus-warnings.cf'
> Resolving www.timj.co.uk... done.
> Connecting to www.timj.co.uk[212.69.37.57]:80... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 106,930 [text/plain]

Disregard that request for contact address, I found it on his web page.

-- 
Chris
Registered Linux User 283774 http://counter.li.org
19:54:09 up 1 day, 2:22, 2 users, load average: 0.93, 1.14, 0.97
Mandrake Linux 10.1 Official, kernel 2.6.8.1-12mdk

* woot is now known as woot-dinner
* Knghtbrd sprinkles a little salt on woot
 I've never had a woot before...  Hope they taste good
 n!
 don't eat me!
* Knghtbrd decides he does not want a dinner that talks to him...  hehe



Re: RDJ and bogus virus warnings rule

2005-04-08 Thread Chris
On Thursday 07 April 2005 07:42 pm, Matt Kettler wrote:
> Chris wrote:
> >http://www.timj.co.uk/linux/bogus-virus-warnings.cf
> >--19:28:30--  http://www.timj.co.uk/linux/bogus-virus-warnings.cf
> >   => `bogus-virus-warnings.cf'
> >Resolving www.timj.co.uk... 212.69.37.57
> >Connecting to www.timj.co.uk[212.69.37.57]:80... connected.
> >HTTP request sent, awaiting response... 403 Forbidden
> >19:28:30 ERROR 403: Forbidden.
> >
> >
> >
> > [EMAIL PROTECTED] rulesdujour]# wget
>
> Chris, I suspect that your address is blacklisted by timj.co.uk for some
> reason.
>
> Is there any chance you were doing something which might be regarded as
> abusive, such as updating via rdj every 5 minutes?
>
> Any chance he might have blacklisted your whole IP block because a virus
> infected host was attacking his website?
>
> The reason I suspect blacklisting is I can wget the file just fine:
>
>
> $ wget http://www.timj.co.uk/linux/bogus-virus-warnings.cf
> --20:38:14--  http://www.timj.co.uk/linux/bogus-virus-warnings.cf
>=> `bogus-virus-warnings.cf'
> Resolving www.timj.co.uk... done.
> Connecting to www.timj.co.uk[212.69.37.57]:80... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 106,930 [text/plain]
>

No, I run RDJ once every 24hrs from a cronjob.  Blacklisting is possible, 
especially with earthlink.  Is there anyway to contact Tim to find out what 
the possible reason for the 403 is?

-- 
Chris
Registered Linux User 283774 http://counter.li.org
19:47:45 up 1 day, 2:15, 2 users, load average: 1.52, 1.40, 0.92
Mandrake Linux 10.1 Official, kernel 2.6.8.1-12mdk

Pretend to spank me -- I'm a pseudo-masochist!



Re: RDJ and bogus virus warnings rule

2005-04-08 Thread Matt Kettler
Chris wrote:

>http://www.timj.co.uk/linux/bogus-virus-warnings.cf
>--19:28:30--  http://www.timj.co.uk/linux/bogus-virus-warnings.cf
>   => `bogus-virus-warnings.cf'
>Resolving www.timj.co.uk... 212.69.37.57
>Connecting to www.timj.co.uk[212.69.37.57]:80... connected.
>HTTP request sent, awaiting response... 403 Forbidden
>19:28:30 ERROR 403: Forbidden.
>
>  
>
> [EMAIL PROTECTED] rulesdujour]# wget


Chris, I suspect that your address is blacklisted by timj.co.uk for some
reason.

Is there any chance you were doing something which might be regarded as
abusive, such as updating via rdj every 5 minutes?

Any chance he might have blacklisted your whole IP block because a virus
infected host was attacking his website?

The reason I suspect blacklisting is I can wget the file just fine:


$ wget http://www.timj.co.uk/linux/bogus-virus-warnings.cf
--20:38:14--  http://www.timj.co.uk/linux/bogus-virus-warnings.cf
   => `bogus-virus-warnings.cf'
Resolving www.timj.co.uk... done.
Connecting to www.timj.co.uk[212.69.37.57]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 106,930 [text/plain]

100%[>]
106,930  186.80K/sETA 00:00

20:38:15 (186.80 KB/s) - `bogus-virus-warnings.cf' saved [106930/106930]





Re: RDJ and bogus virus warnings rule

2005-04-08 Thread Chris
On Thursday 07 April 2005 06:54 pm, .rp wrote:
> I did not have a problem downloading it this week.
Just tried again with RDJ and a manual wget, with the same '403' error.  Hope 
Tim gets it fixed soon.  Is there another source for this rule set?

The following rules had errors:
Tim Jackson's (et al) bogus virus warnings was not retrieved because of: 403 
fro  m http://www.timj.co.uk/linux/bogus-virus-warnings.cf.
Additional Info:
403
[EMAIL PROTECTED] rulesdujour]# wget 
http://www.timj.co.uk/linux/bogus-virus-warnings.cf
--19:28:30--  http://www.timj.co.uk/linux/bogus-virus-warnings.cf
   => `bogus-virus-warnings.cf'
Resolving www.timj.co.uk... 212.69.37.57
Connecting to www.timj.co.uk[212.69.37.57]:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
19:28:30 ERROR 403: Forbidden.

-- 
Chris
Registered Linux User 283774 http://counter.li.org
19:27:09 up 1 day, 1:55, 1 user, load average: 0.35, 0.42, 0.42
Mandrake Linux 10.1 Official, kernel 2.6.8.1-12mdk

quark:
The sound made by a well bred duck.