Re: uri(bl) checks don't detect URLs with capitalized Http

2005-04-14 Thread mewolf1
In an older episode (Thursday 14 April 2005 00:54), Theo Van Dinter wrote:

 In this case, however, it's not clear if he's running something like a
 Fedora RPM version of SpamAssassin where he could just go ahead and update
 at will, or if it's something like Barracuda/etc, where you really can't
 just go changing things on your own.  The flip side of that of course is
 that you'll have vendor support who you can call and make requests of. ;)

at home, i am running debian linux with SpamAssassin version 3.0.2 running on 
Perl version 5.8.4, or as debian puts it:
Installed: 3.0.2-1
Candidate: 3.0.2-1
== nothing newer available in debian.

at work, we do use a vendors' pre-installed SpamAssassin version 3.0.2 running 
on Perl version 5.8.5 and Fedora Core release 3 (Heidelberg), rpms built by 
the vendor:
spamassassin-3.0.2-1
spamassassin-tools-3.0.2-1

i will have to find out exactly which modifications have been applied to which 
source by the vendor. anyway, further modifications are possible both from 
the vendor or us, the owners. learning at home how to apply the fix at all 
will make it easier to be able to judge / request / apply necessary changes 
at work.



Re: uri(bl) checks don't detect URLs with capitalized Http

2005-04-14 Thread Daryl C. W. O'Shea
[EMAIL PROTECTED] wrote:
In an older episode (Thursday 14 April 2005 00:54), Theo Van Dinter wrote:

In this case, however, it's not clear if he's running something like a
Fedora RPM version of SpamAssassin where he could just go ahead and update
at will, or if it's something like Barracuda/etc, where you really can't
just go changing things on your own.  The flip side of that of course is
that you'll have vendor support who you can call and make requests of. ;)

at home, i am running debian linux with SpamAssassin version 3.0.2 running on 
Perl version 5.8.4, or as debian puts it:
Installed: 3.0.2-1
Candidate: 3.0.2-1
== nothing newer available in debian.

at work, we do use a vendors' pre-installed SpamAssassin version 3.0.2 running 
on Perl version 5.8.5 and Fedora Core release 3 (Heidelberg), rpms built by 
the vendor:
spamassassin-3.0.2-1
spamassassin-tools-3.0.2-1

i will have to find out exactly which modifications have been applied to which 
source by the vendor. anyway, further modifications are possible both from 
the vendor or us, the owners. learning at home how to apply the fix at all 
will make it easier to be able to judge / request / apply necessary changes 
at work.
Here's the diff:
http://svn.apache.org/viewcvs.cgi/spamassassin/trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm?rev=148873r1=125891r2=148873makepatch=1diff_format=u


Re: Recommendation on SARE rules to add.

2005-04-14 Thread Robert Menschel
Hello Robert,

Tuesday, April 12, 2005, 10:24:54 PM, you wrote:

RM SA 3.0

RM I was wondering if anybody had a recommendation for a initial SARE set
RM of rules to add.  I am not exactly satisfied with my amount of FN's
RM currently.  Any ideas would be appreciated.

First -- I'm in full agreement with all of the other
suggestions/considerations offered that I've seen.

And since I haven't seen any specific rule set files, I'll offer my
suggestions there:

70_sare_evilnum0.cf
70_sare_genlsubj0.cf
70_sare_header0.cf
70_sare_html0.cf
70_sare_uri0.cf

These above are created and selected and regularly rechecked to avoid
any/all hits against ham. They should be safe for everyone.

70_sare_specific.cf
70_sare_oem.cf
70_sare_spoof.cf
70_sare_unsub.cf
70_sare_random.cf
72_sare_redirect_post3.0.0.cf
88_FVGT_Tripwire.cf

These aren't quite as safe, but still should be suitable for the great
majority of systems.

70_sare_adult.cf
70_sare_bayes_poison_nxm.cf
72_sare_bml_post25x.cf
chickenpox.cf
weeds_2.cf

A little bit more risky, and might FP if one of your users runs an
adult book store, is a mortgage broker, or likes to *em*pha*size*
words, etc.

70_sare_evilnum1.cf
70_sare_genlsubj1.cf
70_sare_header1.cf
70_sare_html1.cf
70_sare_uri1.cf

Like the first set, but a little bit more risky. Will hit ham, but
should not cause FPs.

If you are located in the USA/England/Canada/Australia, and do not
receive foreign-language non-spam, then you can also benefit from
70_sare_genlsubj_eng.cf
70_sare_header_eng.cf
70_sare_html_eng.cf
70_sare_uri_eng.cf

I guess we really should put SARE guidelines like this onto a page
linked to http://wiki.apache.org/spamassassin/CustomRulesets -- I'll
get that started, after as I've put my income taxes to bed...

Bob Menschel




Re[4]: Arithmetic score for replaced O's and I's?

2005-04-14 Thread Robert Menschel
Hello mewolf1,

Tuesday, April 12, 2005, 6:37:15 PM, you wrote:

mgn In an older episode (Wednesday 13 April 2005 02:57), Robert Menschel wrote:
 Send me your t1r3d, h0m3|ess, hun6ry, un\/\/anted [EMAIL PROTECTED], and
 I'|| f1nd a 600D horme 4 them...
 
 (Not the entire spam emails, please -- just the obfuscations.)

mgn Robert, I just sent you obfuscations privately off list, is that
mgn what you meant?

Perfect.  I built rules for them last night and mass-checked them this
morning. I'll run a few passes to refine them, then have other SARE
ninja's mass-check to get broader results, and then we'll fine tune
for performance, and hopefully have something published before end of
month.

Other contributions more than welcome.

Bob Menschel





Re: report_safe doesn't seem to work since FC3 upgrade

2005-04-14 Thread Daryl C. W. O'Shea
The problem with your setup is with spamass-milter, not SpamAssassin.
The problem with your lack of responses is that you started your thread 
by replying to a message in the thread titled --username flag.  I 
don't know what your problem has to do with the --username option, but I 
guess the people reading that thread don't know about your problem.  I'd 
suggest next time you send a new message (thus starting a new thread) 
rather than hijacking an existing thread.

Daryl


Re: SA randomly sucking up huge amounts of memory

2005-04-14 Thread Robert Menschel
Hello Dennis,

Wednesday, April 13, 2005, 1:24:27 PM, you wrote:

DS A week or two ago, SA started randomly sucking up huge amounts of memory
DS in one or more of the spamd children.  ...

DS I have managed to catch 3 of the messages that it has hung on.
DS ...

I've got some suspicions, but would need to actually have the emails
to verify them.  Any chance you can zip or tar.gz them up (so they
don't tie up my SA system) and mail them to me?

Bob Menschel





RE: Need for a new rule?

2005-04-14 Thread Gray, Richard
 -Original Message-
 From: Stuart Johnston [mailto:[EMAIL PROTECTED] 
 Sent: 13 April 2005 21:42
 To: Andreas Davour
 Cc: users@spamassassin.apache.org
 Subject: Re: Need for a new rule?
 
 Andreas Davour wrote:
  
  The following message have many characteristics in common with much 
  spam I've been getting lately. It's about investments, 
 often shares, 
  stock options or oil. One odd thing about those messages is 
 that they 
  all, like the one quoted below, have the letter 'l' substituted for 
  the pipe character i.e. '|'.
  

Here we have a large number of obfuscated word rules, including a number
that are related to stocks and shares. We need to be careful as we do
receive legitimate 'forrrward loooking statements' (obfuscated in case
you don't like the phrase) so tend to have things like

(?!millions?)m[1i|][l1|][l1|][l1|][0o]n[5s]? (not checked)

The basic rule is that real people don't try to hide what they are
saying. There does exist a problem with other companies who use
profanity filters. The sender beats their profanity filter by
obfuscating the word, and we catch it because they obfuscated! 


---
This email from dns has been validated by dnsMSS Managed Email Security and is 
free from all known viruses.

For further information contact [EMAIL PROTECTED]






Re: yet another Sendmail filter for SpamAssassin daemon spamd

2005-04-14 Thread John Andersen
On Wednesday 13 April 2005 09:57 am, Eugene Kurmanin wrote:
 5. Copy SPAM to the defined mailbox;
 6. Reject SPAM at the DATA stage,
    if SPAM score is greater than defined value;
 7. Log all activities to syslog.

Well if you are going to reject, why also accept
and copy to mailbox.

Is there more than one threshold, so that you can
reject if it gets a really bad score (like 20 or 30) and
reject but still copy to mailbox if the score is less?

-- 
_
John Andersen


pgp7jV5cb5j9D.pgp
Description: signature


sa-learn doesn't learn

2005-04-14 Thread Raphael Clifford
Hi,
I am trying to set up Bayes classifying for the first time using 
sa-learn.  It looks like it is working but doesn't actually seem to 
be... Here is the output

[raph]$ sa-learn --showdots --mbox --spam 
.thunderbird/gmnjx6hf.default/Mail/mail.plus.net/Junk
. 

Learned from 870 message(s) (1025 message(s) examined).
[raph]$ sa-learn --showdots --mbox --ham 
.thunderbird/gmnjx6hf.default/Mail/mail.plus.net/Inbox
.. 

Learned from 2390 message(s) (2578 message(s) examined).
Now when I do spamassassin -D --lint I get
[...]
debug: bayes: 5790 tie-ing to DB file R/O 
/home/raph/.spamassassin/bayes_toks
debug: bayes: 5790 tie-ing to DB file R/O 
/home/raph/.spamassassin/bayes_seen
debug: bayes: found bayes db version 3
debug: using /home/raph/.spamassassin for user state dir
debug: bayes: Not available for scanning, only 1 spam(s) in Bayes DB  200
debug: bayes: 5790 untie-ing
debug: bayes: 5790 untie-ing db_toks
debug: bayes: 5790 untie-ing db_seen
debug: Score set 1 chosen.
debug:  MIME PARSER START 
debug: main message type: text/plain
debug: parsing normal part
debug: added part, type: text/plain
debug:  MIME PARSER END 
debug: bayes: 5790 tie-ing to DB file R/O 
/home/raph/.spamassassin/bayes_toks
debug: bayes: 5790 tie-ing to DB file R/O 
/home/raph/.spamassassin/bayes_seen
debug: bayes: found bayes db version 3
debug: bayes: Not available for scanning, 

Re: sa-learn doesn't learn

2005-04-14 Thread Raphael Clifford
Just to reply to my own message.
It is seems to make a crucial difference which order to run the spam and 
ham tests in!  I reran the spam test and it now says I have

(from sa-learn dump magic)
[...]
0.000  0881  0  non-token data: nspam
0.000  0   1524  0  non-token data: nham
[...]
So the number of spam has increased to roughly what it should be but the 
number of ham has decreased by 1000!

Can anyone explain this?  It looks like a bug as surely the order of 
execution shouldn't matter?!

Raphael
Raphael Clifford wrote:
Hi,
I am trying to set up Bayes classifying for the first time using 
sa-learn.  It looks like it is working but doesn't actually seem to 
be... Here is the output

[raph]$ sa-learn --showdots --mbox --spam 
.thunderbird/gmnjx6hf.default/Mail/mail.plus.net/Junk
. 

Learned from 870 message(s) (1025 message(s) examined).
[raph]$ sa-learn --showdots --mbox --ham 
.thunderbird/gmnjx6hf.default/Mail/mail.plus.net/Inbox
.. 

Learned from 2390 message(s) (2578 message(s) examined).
Now when I do spamassassin -D --lint I get
[...]
debug: bayes: 5790 tie-ing to DB file R/O 
/home/raph/.spamassassin/bayes_toks
debug: bayes: 5790 tie-ing to DB file R/O 
/home/raph/.spamassassin/bayes_seen
debug: bayes: found bayes db version 3
debug: using /home/raph/.spamassassin for user state dir
debug: bayes: Not available for 

Re: sa-learn doesn't learn

2005-04-14 Thread Raphael Clifford
Raphael Clifford wrote:
Just to reply to my own message.
It is seems to make a crucial difference which order to run the spam 
and ham tests in!  I reran the spam test and it now says I have

Typo:
spam test above should be sa-learn command for the spam folder
(from sa-learn dump magic)
[...]
0.000  0881  0  non-token data: nspam
0.000  0   1524  0  non-token data: nham
[...]

Raphael


RE: sa-learn doesn't learn

2005-04-14 Thread Gray, Richard
From past experience, I would suggest you checked the dependencies on
the 3 files that are created by sa-learn. It sounds like it was able to
update bayes_toks but not one of the other files. (Can't remember which)

First off, run sa-learn --rebuild. I seem to recall this was needed
after running sa-learn (may be wrong)

What do you see when you type 'sa-learn --dump magic' execute this
command after each learning stage to see the effect your learning has
had. If you see no chance, run the rebuild command, then do it again and
see if there is a change.

If it doesn't work, post your results?

R

 -Original Message-
 From: Raphael Clifford [mailto:[EMAIL PROTECTED] 
 Sent: 14 April 2005 09:52
 To: users@spamassassin.apache.org
 Subject: sa-learn doesn't learn
 
 Hi,
 
 I am trying to set up Bayes classifying for the first time 
 using sa-learn.  It looks like it is working but doesn't 
 actually seem to be... Here is the output
 
 
 [raph]$ sa-learn --showdots --mbox --spam 
 .thunderbird/gmnjx6hf.default/Mail/mail.plus.net/Junk
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 . 
 
 Learned from 870 message(s) (1025 message(s) examined).
 [raph]$ sa-learn --showdots --mbox --ham 
 .thunderbird/gmnjx6hf.default/Mail/mail.plus.net/Inbox
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
 ..
  
 
 Learned from 2390 message(s) (2578 message(s) examined).
 
 
 

RCVD_IN_SORBS_WEB

2005-04-14 Thread Ronan McGlue
why is the weighting for RCVD_IN_SORBS_WEB scores 0 0 0 then 0.007...
I know there is probably a good reason for this low a score but could 
someone explain it to me please as I have one very irate user who likes 
nothing better than to pick holes in spamassassin, which in turn is a 
headache for me. apparently 1 spam every week is still not good enought 
protection for him.

thanks
ronan
begin:vcard
fn:Ronan McGlue
n:McGlue;Ronan
email;internet:ronan(dot)mcglue(at)qub(dot)ac(dot)uk
x-mozilla-html:FALSE
version:2.1
end:vcard



Re: report_safe doesn't seem to work since FC3 upgrade

2005-04-14 Thread Daryl C. W. O'Shea
Chris Harvey wrote:
The problem with your setup is with spamass-milter, not SpamAssassin.

And people exclusively ask questions about SA on here? Never ever one on the
milter?
Perhaps I should have been a little more verbose -- I wasn't saying not 
to ask your question here.  It's not a SpamAssassin config issue it's 
likely a config issue with the spamass-milter.

Your maillog paste doesn't show the Subject: header being modified, it 
only shows the X-Spam headers being added.  This may be a symptom of the 
'-m' option being present in the call to whatever the spamass-milter 
executable is.

As for the lack of encapsulation, in the 60 seconds or so of looking 
through the spamass-milter documentation to find the -m option info, I 
didn't see any mention of encapsulation -- I don't think it's possible 
with this milter.

Daryl


RE: report_safe doesn't seem to work since FC3 upgrade

2005-04-14 Thread Chris Harvey

 Your maillog paste doesn't show the Subject: header being modified, it
 only shows the X-Spam headers being added.  This may be a symptom of the
 '-m' option being present in the call to whatever the spamass-milter
 executable is.

Yes exactly. I see the milter doing *some* work, i.e. adding x-header
information, but not doing other work such as changing the subject line as
it was doing.

 As for the lack of encapsulation, in the 60 seconds or so of looking
 through the spamass-milter documentation to find the -m option info, I
 didn't see any mention of encapsulation -- I don't think it's possible
 with this milter.

I think I may downgrade the milter as far back as I can and see if that
fixes it. If it does, then we know that this specific version ignores the SA
local.cf commands to change subject and report safe.



Re: Arithmetic score for replaced O's and I's?

2005-04-14 Thread Jim Maul
Robert Menschel wrote:
Hello mewolf1,
Tuesday, April 12, 2005, 6:37:15 PM, you wrote:
mgn In an older episode (Wednesday 13 April 2005 02:57), Robert Menschel wrote:
Send me your t1r3d, h0m3|ess, hun6ry, un\/\/anted [EMAIL PROTECTED], and
I'|| f1nd a 600D horme 4 them...
(Not the entire spam emails, please -- just the obfuscations.)

mgn Robert, I just sent you obfuscations privately off list, is that
mgn what you meant?
Perfect.  I built rules for them last night and mass-checked them this
morning. I'll run a few passes to refine them, then have other SARE
ninja's mass-check to get broader results, and then we'll fine tune
for performance, and hopefully have something published before end of
month.
Other contributions more than welcome.
Bob Menschel

Something that tries to catch those weird table obfuscations would be 
great ;)  Something like i posted a while back in the Extra Sare rules 
for meds thread.  I dont know if this is possible or not but...

-Jim


RE: report_safe doesn't seem to work since FC3 upgrade

2005-04-14 Thread Chris Harvey


 I think I may downgrade the milter as far back as I can and see if that
 fixes it. If it does, then we know that this specific version ignores the
 SA
 local.cf commands to change subject and report safe.

Looks like I may have a different answer. Am testing it now.

-
Check your /etc/rc.d/init.d/spamass-milter file.  The RPM distributed by
RedHat apparently puts -m in EXTRA_FLAGS by default.  Make sure you file a
bugreport with them so they can fix it.

http://savannah.nongnu.org/support/?func=detailitemitem_id=103990



Re: sa-learn - bayes training...

2005-04-14 Thread Jean Caron
Kevin, your assumption is correct, user accounts are on the server and spamc 
is used. I already have the central DB setup using bayes_path in local.cf. 

I think what you are saying confirms what I suspected, but it's still not 
100% clear. Even though I have a central DB, all users must train it 
individually, is that it ? 

For example, if UserA populates the shared folders respectively with ham and 
spam from messages he/she received, if UserB trains the central DB against 
those msgs, it will have no effect for UserA ? All users must individually 
train the central DB even though they train using the same msgs from the 
same shared folders ? 

Sorry if I seem a little dense, but I think I'm getting it. I hope !
Jean 

Kevin Peuhkurinen writes: 

Jean Caron wrote: 

Folks,
I searched the archive, tried different things, yet I need to ask a few 
questions.
I'm running SA 3.0.2 with Qmail/QQ 1.25, and procmail, on linux. Works 
great. Bayes auto-learns ok, I run sa-learn from a dedicated user every 
night for ham and spam. My logs show how many msgs were inspected and how 
many were learned. So far so good.
Here's the part I'm unsure of, I have one centralized bayes DB own by 
this dedicated user. This user runs sa-learn against two shared 
folders, one for ham and one for spam. All users (only a hand full) may 
populate the shared folders. Many thousand msgs have gone through 
sa-learn. I thought this was all too easy...
My problem is bayes does not seem to have any effect what so ever on the 
amount of spam delivered to INBOXes. I keep receiving these low score 
spam msgs still.
I now suspect this centralized DB, updated by this user alone, may not 
produce the expected results. I've read in the archive that individual 
users should run cron jobs against their own ham and spam folders. The 
issue with this is that only one user has an actual shell defined on the 
system, so the others can't run cron. Then again, that just a suspicion, 
I may be wrong, and something else may be missing or mis-configured, and 
that's why I'm posting this... I'm a little confused. I don't understand 
how bayes works exactly, so I can't come to any helpfull conclusion about 
my setup.
Can anyone see through this and help me understand what is happening ?
Thanks in advance,
Jean 

Jean,
I'm not entirely sure based on the information you provided how spamd is 
getting called, but I'm quite sure that your setup is not doing what you 
expect it to.I'm guessing since you say that you are using procmail 
that you have user accounts set up on the server itself and that spamc is 
being called as individual users from .forward files.If this is the 
case, then each user will have a .spamassassin/ directory in their home 
which will contain their own personal Bayes database.   Your problem is 
that you have one particular user who runs sa-learn, so only their Bayes 
DB is being trained (other than through the auto-learning feature, that 
is, which is  updating the individual databases).   

One easy option you can consider is the use of a global Bayes DB for all 
your users instead of each of them having their own personal DB.   Bayes 
tends to be less effective with global rather than personal databases, but 
only if the individual users are able to do their own training.   You 
could do this fairly easily by setting the bayes_path option in your 
/etc/mail/spamassassin/local.cf file and have it point the .spamassassin/ 
directory of the user who is doing all the sa-learn training. 

Hope that helps.
Kevin 




Bayes Problems

2005-04-14 Thread J Thomas Hancock
I am having one heck of a time getting Bayes working with SpamAssassin.

I am using postfix 2.2.2 and SA 3.00.2.  Postfix is being ran as the user
postfix.  SA is being ran as postdrop.  

The following is the output from the syslog.

spamd[22065]: debug: plugin:
Mail::SpamAssassin::Plugin::Hashcash=HASH(0xa8b6820) implements
'parse_config'
spamd[22065]: debug: bayes: 22065 tie-ing to DB file R/O
/home/postdrop/.spamassassin_toks
spamd[22065]: debug: bayes: 22065 tie-ing to DB file R/O
/home/postdrop/.spamassassin_seen
spamd[22065]: debug: bayes: found bayes db version 3
spamd[22065]: debug: bayes: Not available for scanning, only 35 ham(s) in
Bayes DB  200
spamd[22065]: debug: bayes: 22065 untie-ing
spamd[22065]: debug: bayes: 22065 untie-ing db_toks
spamd[22065]: debug: bayes: 22065 untie-ing db_seen
spamd[22065]: debug: Score set 1 chosen.
spamd[22065]: debug:  MIME PARSER START 
spamd[22065]: debug: main message type: text/plain
spamd[22065]: debug: parsing normal part
spamd[22065]: debug: added part, type: text/plain
spamd[22065]: debug:  MIME PARSER END 
spamd[22065]: debug: using /tmp/spamd-22065-init/.spamassassin for user
state dir
spamd[22065]: debug: bayes: no dbs present, cannot tie DB R/O:
/tmp/spamd-22065-init/.spamassassin/bayes_toks
spamd[22065]: debug: metadata: X-Spam-Relays-Trusted:

Unfortunately I have tinkered with this too much so I really can not list
what I have or have not tried.

Any input would be appreciated.

Thank you,
Tom




RE: Bayes Problems

2005-04-14 Thread Kang, Joseph S.
[clipped for brevity]...

The source of your problem is indicated by

 spamd[22065]: debug: bayes: Not available for scanning, only 35 ham(s) in
Bayes DB  200

To use Bayes with SA, you need a minimum of 200 HAM and SPAM messages
learned into the db.

Hope this helps.

-Joe K.


RE: yet another Sendmail filter for SpamAssassin daemon spamd

2005-04-14 Thread Matthew.van.Eerde
John Andersen wrote:
 On Wednesday 13 April 2005 09:57 am, Eugene Kurmanin wrote:
 5. Copy SPAM to the defined mailbox;
 6. Reject SPAM at the DATA stage,
    if SPAM score is greater than defined value;
 7. Log all activities to syslog.
 
 Well if you are going to reject, why also accept
 and copy to mailbox.

I can think of situations where you would reject (in order to not assume 
responsibility for the final delivery of the mail) but still want a copy of 
what you rejected for forensic purposes.  Most of them have to do with 
espionage :)

Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902
Hispanic Business Inc./HireDiversity.com Software Engineer
perl -emap{y/a-z/l-za-k/;print}shift Jjhi pcdiwtg Ptga wprztg,


Mailbox disabled rejection scripting

2005-04-14 Thread Evans, Darrell
Anyone doing any automated methods for catching large numbers of these
rejects and then adding the host into a sendmail access db similar to
vispan?

ruleset=check_rcpt, arg1=[EMAIL PROTECTED], relay=[211.150.242.139],
reject=550 5.2.1 [EMAIL PROTECTED]... Mailbox disabled for this recipient



Re: Still Stuck. bayes

2005-04-14 Thread Peter Marshall
Thank you for the detailed reply.  I made all of the changes you 
suggestd.  They were very good, and I will have to see how well they 
work now.

I just had one more question.  Your last statement You don't want to 
sa-learn 200 messages just to learn 5 I guess it would be doing 
that in the Inbox and Spam Directory all the time.  I am sure some 
users, myself included, don't always file messsages as quick as they 
should from their inbox .. so they would end up relearning all of that 
mail multiple times ... well .. at least running it through sa-learn 
multiple times.  Is there a problem doing this, and if so, is there a 
better solution for learning ham ?

(by the way, I changed one of the moves, to move data from MissedSpam to 
Trash instead of the spam box, so that eliminates learning those 
messages twice)

Thank you a bundle for looking at my script.
Peter
Bowie Bailey wrote:
From: Peter Marshall [mailto:[EMAIL PROTECTED]
I got this book (slightly outdated) called Spamassassin (by O'Reilly).
Anyway, it says if you are going to sa-learn a bunch of directories in
Maildir format you should do the following:
sa-learn --no-rebuild --spam mail/spam
sa-learn --no-rebuild ...blah.
sa-learn --no-rebuild --ham ...blah blah
salearn --rebuild
So I give that a go, and it gives messages to use sync and no-sync.

Right, sync and no-sync are the correct options.

If I leave out the --no-sync options ... it gives no out put .. (i 
assume this means nothing got learned.)  Here is my script.

You need to run sync once.  It doesn't need to be run for each mailbox.

Do I need to sync ?  I am going to be running this for every user on the
box (as that user of course) in a cron job.

Each user will need to sync after he learns all of his directories.

---The Script
#!/bin/sh
# Inbox
/usr/bin/sa-learn --no-sync --ham --dir ~/Maildir
# Spam Box
/usr/bin/sa-learn --no-sync --spam --dir ~/Maildir/.Spam
# Missed Spam
/usr/bin/sa-learn --no-sync --spam --dir ~/Maildir/.Spam.MissedSpam
# Not Spam
/usr/bin/sa-learn --sync --ham --dir ~/Maildir/.Spam.NotSpam
## Clean up spam Directories.
if [ `\ls ~/Maildir/.Spam.MissedSpam/cur |wc -l` -ne 0 ]; then
  mv ~/Maildir/.Spam.MissedSpam/cur/* ~/Maildir/.Spam
else
  echo Nothing to move in MissedSpam - cur
fi
if [ `\ls ~/Maildir/.Spam.NotSpam/cur |wc -l` -ne 0 ]; then
  mv ~/Maildir/.Spam.NotSpam/cur/* ~/Maildir/cur
else
  echo Nothing to move in NotSpam - cur
fi
---

What I noticed immediately is that the directories you are learning from are
not the ones holding the message files.  Try learning from the 'cur'
directories.  For example:
  /usr/bin/sa-learn --no-sync --ham --dir ~/Maildir/cur
  /usr/bin/sa-learn --no-sync --spam --dir ~/Maildir/.Spam/cur
  /usr/bin/sa-learn --no-sync --spam --dir ~/Maildir/.Spam.MissedSpam/cur
  /usr/bin/sa-learn --sync --ham --dir ~/Maildir/.Spam.NotSpam/cur
Also, after you do the learning, you are moving the messages to the wrong
place.  That first 'mv' line should look like this:
  mv ~/Maildir/.Spam.MissedSpam/cur/* ~/Maildir/.Spam/cur
All of this brings up another question...What is the intended mail flow
here?  I'm a bit confused by the way you are moving messages around.
Normally, after you learn a message, you should move it to a place where it
won't be learned next time.  Otherwise, the messages will continue to pile
up and sa-learn will have to wade through more and more messages each time
you run it.  You don't want sa-learn to have to process 200 messages just to
learn from 5 of them.
Bowie

--
Peter Marshall, BCS
System Administrator, CARIS
CARIS 2005 - Mapping a Seamless Society
10th International User Group Conference and Educational Sessions
Halifax, NS, Canada
E-mail [EMAIL PROTECTED] for more.


Re: RCVD_IN_SORBS_WEB

2005-04-14 Thread Matt Kettler
Ronan McGlue wrote:

 why is the weighting for RCVD_IN_SORBS_WEB scores 0 0 0 then 0.007...

 I know there is probably a good reason for this low a score but could
 someone explain it to me please as I have one very irate user who
 likes nothing better than to pick holes in spamassassin, which in turn
 is a headache for me. 


Looking at statistics.txt it's got a low overall hitrate, and while it's
S/O is fairly good, it does in fact hit some nonspam.

Without combing the entire mass-check results of the corpus, it would be
impossible to determine the cause. However, I suspect that those few
nonspams were also being hit by other rules and the perceptron was
forced to compromise the score of this rule in order to avoid FPs.

Remember, SA's score evolver will accept 100 FN's before it will accept
1 FP. Which really is a good thing. FP's hurt, lots.. FN's are a
nuisance, but they don't cause loss of mail.

Since it's got that policy, the perceptron will try very hard to avoid
the FP. Even if it means letting some spam slip by, it's better than
tagging a bunch of legitimate mail.


Bayes question

2005-04-14 Thread Joe Zitnik
I apologize if this has been asked before, but I need some
clarification.  If I have autolearn for ham set to 0, and the default
BAYES_00 score assigns mail a negative value, and a spam message comes
through with enough good text in it to give it a BAYES_00 and therefore
a negative value BUT it is not a message that has been learned before,
is there the potential for that mail to be learned as ham based on the
negative BAYES score assigned it?  

If nothing else, I just wrote the king of all run on sentences.


Re: Still Stuck. bayes

2005-04-14 Thread Kelson
Peter Marshall wrote:
You don't want to 
sa-learn 200 messages just to learn 5 I guess it would be doing 
that in the Inbox and Spam Directory all the time.  I am sure some 
users, myself included, don't always file messsages as quick as they 
should from their inbox .. so they would end up relearning all of that 
mail multiple times ... well .. at least running it through sa-learn 
multiple times.  Is there a problem doing this, and if so, is there a 
better solution for learning ham ?
It's slower, since sa-learn has to look through all the old messages to 
find the new ones, but it shouldn't mess up the training.

It's just efficiency.  If your system has the resources to handle it, 
don't worry.

--
Kelson Vibber
SpeedGate Communications www.speed.net


Re: RCVD_IN_SORBS_WEB

2005-04-14 Thread Kelson
Paolo Cravero as2594 wrote:
Same goes for who asks to unblock certain messages. They are told they 
can decide to have spam pass through (periodical automatic quarantine 
unlock, actually). In less than a day they usually beg to restore their 
antispam protection (and who cares for that job-unrelated mailing list!).
That reminds me of a customer we had who asked us to disable all spam 
filtering on his account.  A few months later he cancelled because he 
was receiving too much spam.

A definite *headdesk* moment.
--
Kelson Vibber
SpeedGate Communications www.speed.net


Re: Bayes question

2005-04-14 Thread Matt Kettler
Joe Zitnik wrote:

I apologize if this has been asked before, but I need some
clarification.  If I have autolearn for ham set to 0, and the default
BAYES_00 score assigns mail a negative value, and a spam message comes
through with enough good text in it to give it a BAYES_00 and therefore
a negative value BUT it is not a message that has been learned before,
is there the potential for that mail to be learned as ham based on the
negative BAYES score assigned it?  
  

No. It's 100% impossible, as the bayes autolearner makes it's judgments
based on the score the message would have gotten if bayes was disabled.
That kind of self-feedback is exactly why this is done.

(Note that calculating the score as if bayes was disabled also
involves calculating the score using scoreset 0 or 1 instead of 2 or 3.)

The autolearner also ignores any userconf flagged rules, such as white
and blacklists.




0 Hits on blatant spam

2005-04-14 Thread Tim Wesemann
I've been getting alot of leak-through with 3.02 lately and I thought
this one was interesting, particularly that there are plenty of rules that
look for a certain word that rhymes with truck (YKWIM), but no header
rules that look for the word with an ing on the end of it.
I only see one body rule in 20_porn.cf that looks for this string in message
bodies, but it scores pretty low. I have a hunch that this word might be
somewhat common in ham, but rarely in the subject or anywhere else in the
headers of ham... here's a link to the message text:
http://www.timuel.com/badmessage.txt
Also, I can't find a complete list of what rules that I used in
2.64 were obsoleted by the 3.x series. Perhaps this would be good wiki
fodder. I will post the rules that I am left with after my migration to 3.02
(below my sig) and anyone who feels up to it can correct me. =]
Thanks...
--
Tim Wesemann

== Rules that were left after upgrade to 3.02 ===
10_misc.cf
20_anti_ratware.cf
20_body_tests.cf
20_compensate.cf
20_dnsbl_tests.cf
20_drugs.cf
20_fake_helo_tests.cf
20_head_tests.cf
20_html_tests.cf
20_meta_tests.cf
20_phrases.cf
20_porn.cf
20_ratware.cf
20_uri_tests.cf
23_bayes.cf
25_body_tests_es.cf
25_hashcash.cf
25_spf.cf
25_uribl.cf
30_text_de.cf
30_text_fr.cf
30_text_nl.cf
30_text_pl.cf
50_scores.cf
60_whitelist.cf
70_sare_bayes_poison_nxm.cf
70_sare_genlsubj0.cf
70_sare_genlsubj1.cf
70_sare_header0.cf
70_sare_header1.cf
70_sare_html0.cf
70_sare_html1.cf
70_sare_oem.cf
70_sare_random.cf
70_sare_specific.cf
70_sare_spoof.cf
70_sare_unsub.cf
70_sare_uri0.cf
70_sare_uri1.cf
70_sc_top200.cf
72_sare_redirect_post3.0.0.cf
88_FVGT_Bayes_Poison.cf
88_FVGT_body.cf
88_FVGT_subject.cf
88_FVGT_uri.cf
99_FVGT_Tripwire.cf
99_FVGT_meta.cf
99_sare_adult.cf
99_sare_biz_market_learn_post25x.cf
99_sare_fraud_post25x.cf
antidrug.cf
backhair.cf
bogus-virus-warnings.cf
cheat.cf
chickenpox.cf
evilnumbers.cf
languages
mangled.cf
mime_validate.cf
mr_wiggly.cf
random.current.cf
rnd_uc_char.cf
rolex.cf
useless.cf
weeds.cf
wordword.cf
x_headers.cf
=


Re: 0 Hits on blatant spam

2005-04-14 Thread Matt Kettler
Tim Wesemann wrote:

 I've been getting alot of leak-through with 3.02 lately and I thought
 this one was interesting, particularly that there are plenty of rules
 that
 look for a certain word that rhymes with truck (YKWIM), but no header
 rules that look for the word with an ing on the end of it.

 I only see one body rule in 20_porn.cf that looks for this string in
 message
 bodies, but it scores pretty low. I have a hunch that this word might be
 somewhat common in ham, but rarely in the subject or anywhere else in the
 headers of ham... here's a link to the message text:

 http://www.timuel.com/badmessage.txt


It looks like you've mangled the headers a bit, making SA unable to do
DNSbl tests correctly when I test locally. However, it looks like that
message should hit SBL+XBL. 84.130.193.118 is listed.

It also should have hit several DUL tests, but that only works correctly
if your trusted_networks is working correctly.

If your MX Server is NATed, make sure you've got trusted_networks set up
right so SA applies the DNSBLs properly. If it's not, you might be OK,
but make sure it SA trusts all your servers, and nothing more or less
than all your mailservers. DUL's are applied to the most recent  (ie:
first if you work backwards in time through the Received chain)
untrusted host delivering to a trusted host. If SA doesn't trust the
right hosts, then these tests will miss their mark.


 Also, I can't find a complete list of what rules that I used in
 2.64 were obsoleted by the 3.x series.

I can tell you for certain that antidrug.cf is obsoleted by the standard
20_drugs.cf. Remove the old outdated file.

 == Rules that were left after upgrade to 3.02 ===
 10_misc.cf

snip

 60_whitelist.cf

Please distinguish between rules that were left after the upgrade, and
rules that were installed by it.. All of the above should be in
$PREFIX/share/spamassassin, and would have been installed by SA.

All of the below should be in /etc/mail/spamassassin, and would have
been untouched by the upgrade.

 70_sare_bayes_poison_nxm.cf

Snip

 mime_validate.cf

Side note - I'd delete mime_validate.cf. It doesn't work correctly. The
author assumes that rawbody tests are run on the truly raw message
body, which is not true. The message has already been base64 and QP
decoded, so the rules will misfire on properly encoded mail which
contains properly encoded unicode, or binary attachments.

Every rule in that ruleset requires considerable rework. It's an
interesting experiment, but it's unfortunately written without
consideration for what SA does to normalize the content message before
feeding it to rules.










RE: Rules to identify simplified and traditional chinese character sets

2005-04-14 Thread Johnson, Robert F
This header was missed by your rule example.  Does anyone have any ideas
why it was missed?  

Thanks in advance:


Header
--=_alternative 00390FDE48256FE2_=
Content-Type: text/html; charset=gb2312
Content-Transfer-Encoding: base64


Rule:
rawbody  CHINESE_WL_1_B  /\bgb2312\b/i
describe CHINESE_WL_1_B   Whiltelist Simplified Chinese mimepart

full CHINESE_WL_1_C   /^Content-Type:\s+gb2312\b/im
describe CHINESE_WL_1_C   Whiltelist Simplified Chinese mimepart


-Original Message-
From: Loren Wilton [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 12, 2005 5:45 PM
To: users@spamassassin.apache.org
Subject: Re: Rules to identify simplified and traditional chinese
character
sets

 This code fragment illustrates how I do this for Internet headers:

 header   CHINESE_WL_1 Content-Type =~ /gb2312/i
 describe CHINESE_WL_1 White list Simplified Chinese

 Does anyone no how to create a rule to detect these codes in a mime
 header?

There was talk on the dev list a while back of being able to test the
items
in MIME headers.  I'm not clear on whether anything ever came of that.

In any case you can run a 'full' to look for the headers and find them.
Perhaps something like (untested):

full CHINESE_xxx /^Content-Type:\s+gb2312\b/im

Loren



Re: Rules to identify simplified and traditional chinese character sets

2005-04-14 Thread Matt Kettler
Johnson, Robert F wrote:

This header was missed by your rule example.  Does anyone have any ideas
why it was missed?  

Thanks in advance:


Header
--=_alternative 00390FDE48256FE2_=
Content-Type: text/html; charset=gb2312
Content-Transfer-Encoding: base64


Rule:
rawbody  CHINESE_WL_1_B  /\bgb2312\b/i
describe CHINESE_WL_1_B   Whiltelist Simplified Chinese mimepart

full CHINESE_WL_1_C   /^Content-Type:\s+gb2312\b/im
describe CHINESE_WL_1_C   Whiltelist Simplified Chinese mimepart

  


It was not detected by the rawbody rule because this text would have
been stripped first and would have no chance of being matched by it at all.

It wasn't detected by the full rule because it doesn't have any ability
to deal with the quotes and other stuff. It's looking for the gb2312 to
be directly after the Content-Type:, without anything in between but spaces.

might I suggest this instead:

full CHINESE_WL_1_D   /^Content-Type:.{0,30}\bgb2312\b/im
describe CHINESE_WL_1_D   Whitelist Simplified Chinese mimepart




Re: SpamAssassin and Horde

2005-04-14 Thread ngelo A . Camargo
Checked trusted_networks and i guess is not it, received: headers from 
emails send from imp 4.x are:

Received: from 200-102-255-31.smace701.dsl.brasiltelecom.net.br
(200-102-255-31.smace701.dsl.brasiltelecom.net.br [200.102.255.31]) by
domain.tld (Horde) with HTTP for [EMAIL PROTECTED]; Thu Thu,
14 Apr 2005 14:19:04 -0300
This is the IP from the computer the user was using to send mail. Some thing 
is very wrong here. Why IMP 4.x takes user ip and send it as Helo?? This 
does no happens with imp 3.x. I guess i have two options one hack imp code 
to send localhost in helo or make spamassasin igonore imp headers.

Any ideas ???
Full headers:
Return-Path: [EMAIL PROTECTED]
Delivered-To: [EMAIL PROTECTED]
Received: from domain.tld (localhost.localdomain [127.0.0.1])
by odi.com.br (Postfix) with ESMTP id 1C14D19072
for [EMAIL PROTECTED]; Thu, 14 Apr 2005 14:19:05 -0300 (BRT)
Received: by odi.com.br (Postfix, from userid 48)
id E617919071; Thu, 14 Apr 2005 14:19:04 -0300 (BRT)
Received: from 200-102-255-31.smace701.dsl.brasiltelecom.net.br
(200-102-255-31.smace701.dsl.brasiltelecom.net.br [200.102.255.31]) by
webmail.domain.tld (Horde) with HTTP for [EMAIL PROTECTED]; Thu,
14 Apr 2005 14:19:04 -0300
Message-ID: [EMAIL PROTECTED]
Date: Thu, 14 Apr 2005 14:19:04 -0300
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Teste Testando testado
MIME-Version: 1.0
Content-Type: text/plain;
charset=ISO-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
User-Agent: Internet Messaging Program (IMP) H3 (4.0)
X-AV-Checked: ClamAV using ClamSMTP
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on odi.com.br
X-Spam-Level: *
X-Spam-Status: Yes, score=5.7 required=5.0 tests=AWL,BAYES_00,
HELO_DYNAMIC_HCC,HELO_DYNAMIC_IPADDR2,NO_REAL_NAME autolearn=no
version=3.0.1
X-Spam-Report:
*  0.0 NO_REAL_NAME From: does not include a real name
*  3.5 HELO_DYNAMIC_IPADDR2 Relay HELO'd using suspicious hostname (IP addr 
2)
*  3.7 HELO_DYNAMIC_HCC Relay HELO'd using suspicious hostname (HCC)
* -2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1%
*  [score: 0.]
*  1.0 AWL AWL: From: address is in the auto white-list



- Original Message - 
From: Matt Kettler [EMAIL PROTECTED]
To: Angelo Ayres Camargo [EMAIL PROTECTED]
Cc: users@spamassassin.apache.org
Sent: Tuesday, April 12, 2005 2:18 PM
Subject: Re: SpamAssassin and Horde


Angelo Ayres Camargo wrote:
Hello,
Mail sent from horde imp are been taged as spam, this was discussed
here before, searching the archives i found no solution. Anyone have
any ideia of how make mail from Horde/Imp not be taged as spam?
Angelo
Angelo,
First, I assume you mean the thread with subject: Confused about
HELO_DYNAMIC_*
At the end of that thread we concluded it had nothing to do with IMP
whatsoever. Instead, it was a NATed mailserver triggering the broken
trust path problem.
If your inbound MX mailserver is NATed such that it IP is in reserved
range (ie: 10.*, 192.168.*, 172.16.*, etc) you MUST declare
trusted_networks manually.
If you don't, ALL mail originating at dialup accounts that appear in the
Received: headers will be heavily penalized. That includes mail sent via
IMP by dialup users, but is not IMP specific. Mail sent by a dialup user
through even their own ISP's sendmail server will be subject to the same
problems.
See the wiki for details:
http://wiki.apache.org/spamassassin/TrustPath