Re: Spam and the Internet [Was: xxxl spam]

2006-04-17 Thread Alan Premselaar
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Matt Kettler wrote:
...snip...

 Here's one, if you want to see it:
 
 http://mywebpages.comcast.net/mkettler/spam.jpg
 
 
 There's pretty close to zero chance that anyone in the US is going to hop on a
 plane and fly to Guatemala to buy ordinary lawn care products from a small
 store. But that's the kind of ads I'm getting.

but they've got heart-shaped pancake molds... you wouldn't fly to
guatamala for that?  and at Q.29?! what a bargain!


(heh, i couldn't resist)
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEQ0keE2gsBSKjZHQRAjkKAJ9AnC7vS409cSYvoyczXPpK9NNa9QCgtZsb
68xY13eQIvXXLSrkT996/hM=
=rejD
-END PGP SIGNATURE-


Re: Non-English languages

2006-04-17 Thread Alan Premselaar
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Kenneth Porter wrote:
...snip...

 
 To those of you who've successfully learned 2nd and 3rd languages as an
 adult, what do you recommend for accomplishing that?

Kenneth,

  I started learning Japanese when I was 30. (I feel so old saying it
like that) ... anyways, I started with a teach yourself Japanese book
and a computer program to help.  after that I took courses after work at
my local community college.  *THEN* I moved to Japan and really started
to learn :p

Anyways, I've learned a number of programming languages since I was
young.  I applied the same techniques to learning Japanese (specifically
with reading/writing (or typing as the case may be)) and made sure I had
good reference materials handy.

also, I got involved with the Japanese communities on iVisit which
helped a lot too.

alan
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEQ06tE2gsBSKjZHQRAutxAJ0SrBAWtgkt5fNVQdYG4VGGAMaXuACg4XrN
1kPOs6ScAZ3Gieb/sG323R8=
=Twyl
-END PGP SIGNATURE-


RE: Filter check request

2006-04-17 Thread Bowie Bailey
Owen Mehegan wrote:
 I've upgraded to SA 3.1.1 and now both messages hit solidly as spam.
 I also don't see the ALL_TRUSTED mistake, so I'm guessing that was
 caused by the trust code mismatch you mentioned. Thanks!  

The trust path, IMO, is too important to be left to chance.  The
trust path is used behind the scenes in all sorts of ways.  Read the
wiki and the Mail::SpamAssassin::Conf man page and specify
trusted_networks and internal_networks manually so you know they are
right.

http://wiki.apache.org/spamassassin/TrustPath

http://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf.ht
ml#network_test_options

Sorry about the long url.  You may have to piece that last one back
together and paste it into the browser.

-- 
Bowie


Managing Spamassassin Data

2006-04-17 Thread Max Clark
Hi all,

After having spamd exit on me a couple of times (still no idea why), I
decided to put spamd under daemontools control (run file below). While
this has resulted in the stability I was looking for, I am now
presented with a number of growing log/spamassassin files - i.e.:

/service/spamd/razor-agent.log
/root/.spamassassin/auto-whitelist
/root/.spamassassin/bayes_journal
/root/.spamassassin/bayes_seen
/root/.spamassassin/bayes_toks

My question breaks into several parts;

1. The startup script from the FreeBSD port for spamd ran the service
as root - is there any reason not to switch spamd to the qpsmtpd
user/group?

2. Is there a way I can put the razor-agent.log into multilog? If not,
how do I rotate this log file?

3. My experiance with Bayes and AWL on amavisd-new is that these files
will only grow, what is the proper approach to pruning this data?

4. I am considering using an external Mysql database for my Bayes
database - how much should I expect this to trash my server, and what
is the proper approach to pruning this data?

Thanks in advance,
Max

#!/bin/sh
exec 21 \
sh -c '
 exec \
   /usr/local/bin/spamd \
   -x \
   --socketpath=/var/run/spamd/spamd \
   -s stderr
'

--
 Max Clark
 http://www.clarksys.com


Re: Managing Spamassassin Data

2006-04-17 Thread Matt Kettler
Max Clark wrote:
 Hi all,
 
 After having spamd exit on me a couple of times (still no idea why), I
 decided to put spamd under daemontools control (run file below). While
 this has resulted in the stability I was looking for, I am now
 presented with a number of growing log/spamassassin files - i.e.:
 
 /service/spamd/razor-agent.log
 /root/.spamassassin/auto-whitelist
 /root/.spamassassin/bayes_journal
 /root/.spamassassin/bayes_seen
 /root/.spamassassin/bayes_toks
 
 My question breaks into several parts;
 
 1. The startup script from the FreeBSD port for spamd ran the service
 as root - is there any reason not to switch spamd to the qpsmtpd
 user/group?

I would not start it as qpsmtpd, as spamd needs privs to bind its port. However,
if you're not doing multi-user bayes, you can start spamd with -u.

Also, be aware.. the above bayes_db in roots home-directory should never be used
by spamd. Spamd always setuid's itself to nobody if it finds itself running as
root when being called to scan mail. Normally SA gets started as root, then
setuid's to match the userid that calls spamc. However, if root is calling
spamc, then SA winds up using this safety and setuid'ing to nobody.

If you start spamd with -u, it will setuid to the specified user, without regard
for what user called spamc. This should show up as a bunch of spamds that start
as root, and switch to the -u user when scanning.


 
 2. Is there a way I can put the razor-agent.log into multilog? If not,
 how do I rotate this log file?

Can't say as I know. This is generated by the razor tools themselves, so man
razor-agent.conf would be the reference here. However, judging from the docs,
this only supports plain dumb file access, so there's probably no safe way to
rotate it short of killing off SA.

http://razor.sourceforge.net/docs/doc.php?type=podname=razor-agent.conf

 
 3. My experiance with Bayes and AWL on amavisd-new is that these files
 will only grow, what is the proper approach to pruning this data?

Bayes_journal, and Bayes_toks should prune on their own during opportunistic
expiry. However, you can run spamassassin --force-expire to force this.

bayes_seen does not get pruned by expiry, but it can be safely deleted in SA
3.1.0 and SA will re-create it. (or so the devels claim, I've not tested this)

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=2975

auto-whitelist can be pruned using the check-whitelist script from the tools
directory of the tarball and pass it the --clean parameter. (note: most distro
packages do not install this tool, so just grab it from a tarball download if
you don't have it).


usage: check_whitelist [--clean] [--min n] [dbfile]

min will specify the number minimum number of hits for a given AWL entry for
it to be considered worth keeping. min defaults to 2 if not specified (this
prunes all one-off entries).


 
 4. I am considering using an external Mysql database for my Bayes
 database - how much should I expect this to trash my server, and what
 is the proper approach to pruning this data?
 

Dono, I'm not a SQL-bayes user.


Re: Managing Spamassassin Data

2006-04-17 Thread Gary D. Margiotta




2. Is there a way I can put the razor-agent.log into multilog? If not,
how do I rotate this log file?



For myself on FreeBSD, I installed by source, not by port, so adjust your 
configs as necessary, but I use the newsyslog facility (/etc/newsyslog) to 
rotate the log files with the nightly checks:


The maillog is rotated nightly:
/var/log/maillog640  120   *@T00  JC

So, I added another entry for my spam log:
/var/log/spam.log   640  120   *@T00  JC

I've added several logfiles to the file to auto-rotate, such as named, and 
it works like a charm.


My relevant config bits:

How I start spamd:
/usr/local/bin/spamd --daemonize --username spamd --max-children=20 
--min-spare=5 --pidfile /home/spamd/spamd.pid -s local5

(notice the local5 part at the end, which defines the local5 syslog 
identifier)


The relevant syslog config:
local5.*/var/log/spam.log

Hope this helps.

-Gary


Re: Managing Spamassassin Data

2006-04-17 Thread Kris Deugau

Max Clark wrote:

2. Is there a way I can put the razor-agent.log into multilog? If not,
how do I rotate this log file?


Set up a cron job to run 'find / -name razor-agent.log |xargs rm -f'.  g

I've found razor is a little indiscriminate about where it spews this 
log file;  I've found it in some truly bizarre places.  That may have 
something to do with the razor-agents version I'm running (don't even 
recall - it's probably a little old and outdated).


If at all possible, look into disabling it entirely if you can.  IIRC 
it's just a transcript of what razor does when it's called to scan a 
message.


-kgd


Re: Managing Spamassassin Data

2006-04-17 Thread Theo Van Dinter
On Mon, Apr 17, 2006 at 05:44:57PM -0400, Kris Deugau wrote:
 2. Is there a way I can put the razor-agent.log into multilog? If not,
 how do I rotate this log file?
 
 Set up a cron job to run 'find / -name razor-agent.log |xargs rm -f'.  g

Alternately, put the following in razor-agent.conf:

debuglevel = 0

The log still gets created, but nothing ever goes into it.

-- 
Randomly Generated Tagline:
You tell 'em Cemetery, You are so grave.


pgpV0Rmm3fpSm.pgp
Description: PGP signature


non-fuzzy body parts in subject: missed

2006-04-17 Thread Linda Walsh

I have been receiving a spate of short messages that don't seem

to trigger enough default rules to be knocked out.  I was
investigating and noticed a discrepancy [bug?] in the rules.

One particular email refers to the uniquely Male-Body-Part starting
w/P, let's call MBP for purposes discussion.


It gets hit by a '20' rule for body parts in the message body,
but I noticed it doesn't get anything for the subject:
Want a Bigger MBP?  A '25_replace' rule is present for fuzzy
MBP's, but doesn't seem to catch unfuzzy ones. 


So I guess questions might be:
   1) should 'fuzzy' rules match non-fuzzy targets as well
  as fuzzy ones?
   2) Should there be some normalization adjustment for
short messages? 


  I'm thinking a scale factor rather than an absolute score
to add, -- reflecting the general idea that short messages
are not bad, but if you are scoring on the bad side, a
multiplier (ex. 1.1 or 1.2) would increase the score of a message
that is already being sized up as bad.

  Does SA support any multiplier type rules?  Should it, or
rather, do people feel this is a good idea?
i.e.: RULENAME *1.1 (0,*1.1,0,*1.1) type format?

-l







Re: non-fuzzy body parts in subject: missed

2006-04-17 Thread Matt Kettler
Linda Walsh wrote:
 I have been receiving a spate of short messages that don't seem
 
 to trigger enough default rules to be knocked out.  I was
 investigating and noticed a discrepancy [bug?] in the rules.
 
 One particular email refers to the uniquely Male-Body-Part starting
 w/P, let's call MBP for purposes discussion.

 
 
 It gets hit by a '20' rule for body parts in the message body,
 but I noticed it doesn't get anything for the subject:

Yes it does.. the text of the subject line will match against any body rule. SA
pre-pends this so we don't have to have a massive duplication of rules to cover
both body and subject.

 Want a Bigger MBP?  A '25_replace' rule is present for fuzzy
 MBP's, but doesn't seem to catch unfuzzy ones.
 So I guess questions might be:
1) should 'fuzzy' rules match non-fuzzy targets as well
   as fuzzy ones?

IMHO, no. I think there should be two rules with separate scores. In the above
example the scores would be pretty much the same.

However consider the word viagra, an obfuscation is a clear sign of spam.
Un-obfuscated is a less strong sign of spam in this case, because it could be a
joke or a conversation with a medical discussion of some form.

2) Should there be some normalization adjustment for
 short messages?
   I'm thinking a scale factor rather than an absolute score
 to add, -- reflecting the general idea that short messages
 are not bad, but if you are scoring on the bad side, a
 multiplier (ex. 1.1 or 1.2) would increase the score of a message
 that is already being sized up as bad.
 
   Does SA support any multiplier type rules? 

No.

 Should it, or
 rather, do people feel this is a good idea?

I don't feel that would be a good idea. Bear in mind this would also make a
good message (ie: one at -1.0) be more good. It just doesn't make sense to
me to have something which merely acts as a score amplifier instead of a score
adjustment.

Performing any kind of GA to establish a reasonable multiplier value for these
would be a logistical nightmare.

You also get into an issue of order-of-operations. Does this multiplier apply to
the current score as of the momet the rule hits? or after the total message
score is calculated do you make a second pass and factor in all the multipliers,
taking a slight performance hit for the extra calculation run?



Adding RBLs

2006-04-17 Thread stevek
We are currently testing SA 3.1.0 - as our installation may end up being 
quite large. For several years we have run our own dnsrbl lists and would 
like to incorporate them into SA. Most are IPV4sets, but we do have one 
RHBL list. Unfortunately, we have not been successful in getting the rules 
to fire. We have tried adding them to both a variation of 
20_dnsbl_tests.cf and to the local.cf.



Here is a sample of the type of rule we are loading:

## Local RBL

header   RLBL_OSH_RBL  rbleval:check_rbl('rblos', 'rbl.onshore.com.')
describe RLBL_OSH_RBL  rbl.onshore.com
tflags   RLBL_OSH_RBL  net

header   RLBL_OSH_RBL  rbleval:check_rbl_results_for('rblos', '127.0.0.4')
describe RLBL_OSH_RBL  Host in rbl.onshore.com
tflags   RLBL_OSH_RBL  net

scoreRLBL_OSH_RBL 3.0

spamassassin -D --lint shows no errors; however the rules don't seem to 
get called, or fire when we send a test mail from a host listed in the 
RBL. Other dnsrbls -- ie. spamcop, sorbs  -- seem to work fine.



Any help would be appreciated.

TIA - sjk



Steven Kent
onShore Networks
http://www.onshore.com
fingerprint: pub  1024D/D2779F66 2004-04-07


Re: non-fuzzy body parts in subject: missed

2006-04-17 Thread Linda Walsh

Matt Kettler wrote:

Yes it does.. the text of the subject line will match against any body rule. SA
pre-pends this so we don't have to have a massive duplication of rules to cover
both body and subject.

---
Ah.  Didn't know that.  Different tools, different lingo for
message, message header, message body.



Want a Bigger MBP?  A '25_replace' rule is present for fuzzy
MBP's, but doesn't seem to catch unfuzzy ones.
So I guess questions might be:
   1) should 'fuzzy' rules match non-fuzzy targets as well
  as fuzzy ones?


IMHO, no. I think there should be two rules with separate scores. In the above
example the scores would be pretty much the same.

---
I agree on keeping the rules separate, just didn't know fuzzy
Subj was included in body.


However consider the word viagra, an obfuscation is a clear sign of spam.
Un-obfuscated is a less strong sign of spam in this case, because it could be a
joke or a conversation with a medical discussion of some form.

---
Agreed.


Should it, or
rather, do people feel this is a good idea?


I don't feel that would be a good idea. Bear in mind this would also make a
good message (ie: one at -1.0) be more good. It just doesn't make sense to
me to have something which merely acts as a score amplifier instead of a score
adjustment.

---
I realized it would increase goodness as well, but I guess I didn't
see that as much of an issue of the multiplier was applied last.


Performing any kind of GA to establish a reasonable multiplier value for these
would be a logistical nightmare.

---
:-)  True, but that doesn't mean SA couldn't support a post
multiplier! :-)  I can see it's use would be somewhat limited though, as I'm
not sure under what other conditions one would want such a scaling, so its loss
in one circumstance seems minor.  Sometimes I get overfocused on the problem,
and blow up its severity, in my mind.  Uh, maybe I can blame it on original
spam's intent on increasing small problems? ;^?

Feedback is good! :-)

Tnx,
Linda




Re: Adding RBLs

2006-04-17 Thread Matt Kettler
stevek wrote:
 We are currently testing SA 3.1.0 - as our installation may end up
 being quite large. For several years we have run our own dnsrbl lists
 and would like to incorporate them into SA. Most are IPV4sets, but we
 do have one RHBL list. Unfortunately, we have not been successful in
 getting the rules to fire. We have tried adding them to both a
 variation of 20_dnsbl_tests.cf and to the local.cf.


 Here is a sample of the type of rule we are loading:

 ## Local RBL

 header   RLBL_OSH_RBL  rbleval:check_rbl('rblos', 'rbl.onshore.com.')
 describe RLBL_OSH_RBL  rbl.onshore.com
 tflags   RLBL_OSH_RBL  net

 header   RLBL_OSH_RBL  rbleval:check_rbl_results_for('rblos',
 '127.0.0.4')
 describe RLBL_OSH_RBL  Host in rbl.onshore.com
 tflags   RLBL_OSH_RBL  net

 scoreRLBL_OSH_RBL 3.0

Two things:

First, rename the first grouping of header, describe and tflags to
__RLBL_OSH_RBL. Note the addition of double-underscore at the beginning.
This choice of naming is key if you don't want the base rule to fire off
with a score of 1.0 if the RBL returns anything at all.

You cannot ever have two rules with the same name. If this ever happens,
the second declaration over-writes the first. This is very much
intentional, as it allows local configurations to patch the default
rulesets, if they so desire, by over-writing the rule with a different
version. Basically, your setup using the same name causes the second
group of three to over-write and destroy the first three, preventing the
rule from running because there is no check_rbl call.

Second, I'd also suggest changing check_rbl_results_for to
check_rbl_sub. check_rbl_results_for is deprecated, and present for
backward compatibility only.



 spamassassin -D --lint shows no errors; however the rules don't seem
 to get called, or fire when we send a test mail from a host listed in
 the RBL. Other dnsrbls -- ie. spamcop, sorbs  -- seem to work fine.


 Any help would be appreciated.
YW.