do not add 'new line' if the description of a _SUMMARY_ line is too long

2005-11-18 Thread Philipp Snizek
 
Hi

I would like to display SA summaries in HTML tables. I have a problem
with 'new lines' added of summary items that are too long to fit in
one line, e.g. please see the summary below. The description of a test
does not fit in one line. 

 URIBL_JP_SURBL Contains an URL listed in the JP SURBL blocklist
[URIs: rabatt24.biz]
 URIBL_WS_SURBL Enthält URL in WS-Liste (www.surbl.org)
[URIs: rabatt24.biz]
 URIBL_OB_SURBL Enthält URL in OB-Liste (www.surbl.org)
[URIs: rabatt24.biz]

If I could have SA generate summaries that do not add a 'new line' if
the line is too long it would be much easier for me to parse the
summary correctly and put the TEST_NAME in the first html table field
and the description in the second html table field.  


Thanks 
Philipp


test for sql user prefs fails - debug not helpful

2005-11-18 Thread Dale Morin
Hello,

OS: RHES 3.0
SA 3.1.0
spamd start options: SPAMDOPTIONS="-d -D -q -x -m5 -H -u qscand
--max-conn-per-child=10"
spamass-milter 0.3.0
spamass-milter start options: SM_EXTRA_FLAGS="-i xx.xxx.xx.0/24 -r 21 -u
qscand -x -- -f -s 64000"

Here is the output from running "spamd -q -D", then running "echo -e
"From: user\nTo:user\nSubject: Test\n\n" | spamc -u dale" from a different
ssh session:

[28798] dbg: config: Conf::SQL: executing SQL: select preference, value 
from userpref where username = 'dale' or username = '@GLOBAL' order by
username asc
[28798] dbg: config: retrieving prefs for dale from SQL server
[27688] dbg: prefork: new lowest idle kid: 29185
[27688] info: spamd: handled cleanup of child pid 28798 due to SIGCHLD
[27688] dbg: prefork: child closed connection
[27688] info: prefork: child states: I
[29505] dbg: prefork: sysread(8) not ready, wait max 300 secs
[27688] info: spamd: server successfully spawned child process, pid 29505
[27688] dbg: prefork: child 29505: entering state 0
[27688] dbg: prefork: new lowest idle kid: 29185
[27688] dbg: prefork: child 29505: entering state 1
[27688] dbg: prefork: new lowest idle kid: 29185
[27688] dbg: prefork: child reports idle
[27688] info: prefork: child states: II

The "executing SQL" looks OK, as does the "retrieving prefs for dale", but
nothing happens after that.  I have verified the username/password and
that the user has select privileges (actually has select, delete, insert,
update).  I have installed the squirrelmail plugin for users to manage
their whitelists/blacklists.

Any suggestions?


-- 
Dale Morin, Mustang Internet Services, Inc.
"Support Without Compromise"
main office: 847.541.2811
direct line: 815.496.9853
email: [EMAIL PROTECTED]



Re: OT: Spammers' reactions to rejection

2005-11-18 Thread Dave Pooser
> I would vote that these "legitimate mailing list" are not so
> legitimate if they can't clean up bounces after several years of
> getting them.

Legitimate != well-run.
-- 
Dave Pooser
Cat-Herder-in-Chief, Pooserville.com
"In our family, happy usually involves gunfire and at least
two patrol cars showing up." --SomethingPositive.net




Re: OT: Spammers' reactions to rejection

2005-11-18 Thread Matt Kettler

At 04:09 PM 11/18/2005, Vivek Khera wrote:

On Nov 17, 2005, at 2:05 PM, Kelson wrote:


incoming mail.  I turned them back on, unsubscribed from everything
for a few months to weed out any legitimate mailing lists that the
old users might have subscribed to, and eventually turned them into
spam


I would vote that these "ligitimate mailing list" are not so
ligitimate if they can't clean up bounces after several years of
getting them.


True, but the difference between legit and "not so legit" is largely 
irrelevant.


You can't train the "not so legit" commercial emails as spam or blacklist 
the domain without having a user who's pissed off because SA's bayes 
training now thinks all mail from (insert major online store here) is spam.


You'd really be surprised how many major names suffer from this. 



Re: OT: Spammers' reactions to rejection

2005-11-18 Thread Vivek Khera


On Nov 17, 2005, at 2:05 PM, Kelson wrote:

incoming mail.  I turned them back on, unsubscribed from everything  
for a few months to weed out any legitimate mailing lists that the  
old users might have subscribed to, and eventually turned them into  
spam


I would vote that these "ligitimate mailing list" are not so  
ligitimate if they can't clean up bounces after several years of  
getting them.




Re: apple mail better than SA?

2005-11-18 Thread Vivek Khera


On Nov 15, 2005, at 2:51 PM, Justin Mason wrote:

You may need to do some hand-classification if Apple's classifier  
makes

mistakes, of course.  No point training SpamAssassin with incorrect
data.


The main problem with Apple's Mail.app is that you can't tell it NOT  
to learn from some folders, such as my spamassin users list folder,  
or my spam-l folder.  It learns it all, which makes for difficult  
filtering of real spam as opposed to discussion about spam.


I had to turn it off.



Re: Return-Path: ([EMAIL PROTECTED])

2005-11-18 Thread Matt Kettler
Elton Ramos Carvalho wrote:
> Matt Kettler wrote:
> 
>> Elton Ramos Carvalho wrote:
>>  
>>
>>> I`m getting some spams with "Return-Path: ([EMAIL PROTECTED])".
>>>
>>>   
>>>
> Return-Path: <[EMAIL PROTECTED]>
> Return-Path: <[EMAIL PROTECTED]>
>   
>>>
>>> Then I did this rule.
>>>
>>> header EL_NOBODY_RP Return-Path =~ /[EMAIL PROTECTED]/i
>>> describe EL_NOBODY_RP Contém nobody no return path
>>> score EL_NOBODY_RP 1.0
>>>
>>> What do you think about?
>>> Is it a good idea?
>>>   
>>
>> It seems ok, but not for a high score. (see below)
>>
>>  
>>
>>> Will it give me some ham?
>>>   
>>
>>
>> Definitely!
>>
>> I don't know if the following sites still use nobody based return
>> paths, but the
>> I have gotten emails using it from the following sites. Most of these
>> are old,
>> but they are actual examples.
>>
>> groklaw.com  - registration confirmation, 2/2004
>> sourceforge.net - subscription validation mail11/2003
>> Washingtonpost.com - archived article purchase receipt 3/2003
>> mci.com - Internet abuse report acknowledgment. 8/2003
>>
>>
>>
>>
>>  
>>
> Matt, where did you get this information?
> 

I searched my mail archives in my mailclient.


Re: collaborative bayes bases

2005-11-18 Thread Justin Mason
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


"Kevin W. Gagel" writes:
> >in wiki://BayesInSpamAssassin it is said:
> >Do not train Bayes on different mail streams or public spam
> >corpora. These method will mislead Bayes into believing
> >certain tokens are spammy or hammy when they are not.
> >
> >Could you explain why it is so, and what could happen if to
> >teach nayes from several mail servers ?
> 
> The idea in training bayes is to train it for your server.
> Using someone else's mail to train it results in a bayes
> server trained for their email.
> 
> Their email may or may not resemble what you consider as
> spam or ham. That is what the problem is.

Yes.  Also, another problem is that if you exclusively use one class of
mail from that server, e.g. all the mail collected from that server is
spam, then what your training will do is train SpamAssassin to recognise
all mail from that server as spam.

In reality, often there are other types of mail coming from that server,
as well as spam -- but unless you train with those mails, SpamAssassin
won't know that.

- --j.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFDfhV0MJF5cimLx9ARAjkSAJ0WieDVB1sPy7KnWbXJUppZTrBnkgCgjYjc
GPQZTG45xQIzvkxxP6eL1/o=
=sI+S
-END PGP SIGNATURE-



Re: [sa-list] Re: OT: Spammers' reactions to rejection

2005-11-18 Thread Kris Deugau
"Dan Mahoney, System Admin" wrote:
> Three firewall rules I think nobody should live without:
> 
> 1) ipfw add 500 allow tcp from any to me 25 limit src-addr 2 setup
> 
> Yup, you read that right.  Limits tcp connections to no more than two
> per connecting address.  You could probably even drop that to one.

A nice thought, but absolutely useless in the case where you receive any
volume of mail from a host running qmail.  :(

qmail, in case you don't know already, does not serialize mail delivery
by reusing a single connection (like just about every other MTA in
existence).  One message == one recipient == one connection.  >:(

-kgd
-- 
Get your mouse off of there!  You don't know where that email has been!


Re: Unicode right to left HTML override obsfucation

2005-11-18 Thread Kenneth Porter
--On Friday, November 18, 2005 1:36 PM +0200 Sean Doherty 
<[EMAIL PROTECTED]> wrote:



Is there any rules available for catching messages that use the unicode
right to left override in HTML to reverse text (sample attached)?

For instance 'H‬olle‮ W#8236;dlro‮' would render as
'Hello World'

I've seen a couple of these sneak thru recently. I don't want to create
a rule to just look for ‬ and ‬ as I'm not sure what the FP
rate would be like. Is there legitmate reasons for using these tags?


I've never seen that construct, but I have my mail client always set to 
show me the text/plain alternative if it's available.


You might find this page helpful:






Re: collaborative bayes bases

2005-11-18 Thread Kevin W. Gagel
>in wiki://BayesInSpamAssassin it is said:
>Do not train Bayes on different mail streams or public spam
>corpora. These method will mislead Bayes into believing
>certain tokens are spammy or hammy when they are not.
>
>Could you explain why it is so, and what could happen if to
>teach nayes from several mail servers ?

The idea in training bayes is to train it for your server.
Using someone else's mail to train it results in a bayes
server trained for their email.

Their email may or may not resemble what you consider as
spam or ham. That is what the problem is.

=
Kevin W. Gagel
Network Administrator
Information Technology Services
(250) 562-2131 local 448
My Blog:
http://mail.cnc.bc.ca/blogs/gagel

---
The College of New Caledonia, Visit us at http://www.cnc.bc.ca
Virus scanning is done on all incoming and outgoing email.
Anti-spam information for CNC can be found at http://avas.cnc.bc.ca
---


Re: Avoiding FP on domain fragments?

2005-11-18 Thread Matt Kettler
David Gibbs wrote:
> Folks:
> 
> I just got a message that was flagged as spam due to the URIBL_JP_SURBL
> rule ... it matched on the URI 'range.com' ... my domain, midrange.com,
> is what triggered it.
> 
> 
>>3.4 URIBL_JP_SURBL Contains an URL listed in the JP SURBL
>>blocklist
>>   [URIs: range.com]
> 
> 
> Can anyone recommend a way of counteracting the match?


What version of SA are you on? Doesn't happen for me using SA 3.1.0.


Re: Return-Path: ([EMAIL PROTECTED])

2005-11-18 Thread Matt Kettler
Elton Ramos Carvalho wrote:
> I`m getting some spams with "Return-Path: ([EMAIL PROTECTED])".
> 
>>> Return-Path: <[EMAIL PROTECTED]>
>>> Return-Path: <[EMAIL PROTECTED]>
> 
> 
> Then I did this rule.
> 
> header EL_NOBODY_RP Return-Path =~ /[EMAIL PROTECTED]/i
> describe EL_NOBODY_RP Contém nobody no return path
> score EL_NOBODY_RP 1.0
> 
> What do you think about?
> Is it a good idea?
It seems ok, but not for a high score. (see below)

> Will it give me some ham?

Definitely!

I don't know if the following sites still use nobody based return paths, but the
 I have gotten emails using it from the following sites. Most of these are old,
but they are actual examples.

groklaw.com  - registration confirmation, 2/2004
sourceforge.net - subscription validation mail  11/2003
Washingtonpost.com - archived article purchase receipt 3/2003
mci.com - Internet abuse report acknowledgment. 8/2003




Re: OT: Spammers' reactions to rejection

2005-11-18 Thread John Hardin
On Thu, 2005-11-17 at 11:55, Christian Recktenwald wrote:
> On Thu, Nov 17, 2005 at 11:42:44AM -0800, John Woolsey wrote:
> > It would be an interesting addition to a honeypot. Make the mail server
> > just hang up and not respond to tie up connections on the spammer.
> 
> There's a cool piece of software holding tcp connections
> alive as long as possible called "labrea".
> But be careful: it saturated the NAT table of my firewall after
> some hours by holding hundreds of "connections" alive. ;-)

The problem with LaBrea for tarpitting SMTP connections is it does not
send the SMTP server greeting before tarpitting the connection, so the
client will only be trapped for the duration of its own
wait-for-greeting timeout.

I tried to talk Tom into adding a port-number/response-string option to
LaBrea to more effectively trap such protocols, but I haven't looked at
it lately to see if anything along these lines has been done.

--
John Hardin
Development and Technology group (Seattle)
CRS Retail Systems, Inc.
3400 188th Street SW, Suite 185
Lynnwood, WA 98037
voice: (425) 672-1304
  fax: (425) 672-0192
email: [EMAIL PROTECTED]
  web: http://www.crsretail.com
---
 If you smash a computer to bits with a mallet, that appears to count
 as encryption in the state of Nevada.
   - CRYPTO-GRAM 12/2001
---



Avoiding FP on domain fragments?

2005-11-18 Thread David Gibbs
Folks:

I just got a message that was flagged as spam due to the URIBL_JP_SURBL
rule ... it matched on the URI 'range.com' ... my domain, midrange.com,
is what triggered it.

> 3.4 URIBL_JP_SURBL Contains an URL listed in the JP SURBL
> blocklist
>[URIs: range.com]

Can anyone recommend a way of counteracting the match?

Thanks!

david



Return-Path: ([EMAIL PROTECTED])

2005-11-18 Thread Elton Ramos Carvalho

I`m getting some spams with "Return-Path: ([EMAIL PROTECTED])".


Return-Path: <[EMAIL PROTECTED]>
Return-Path: <[EMAIL PROTECTED]>


Then I did this rule.

header EL_NOBODY_RP Return-Path =~ /[EMAIL PROTECTED]/i
describe EL_NOBODY_RP Contém nobody no return path
score EL_NOBODY_RP 1.0

What do you think about?
Is it a good idea?
Will it give me some ham?

Elton Carvalho



Re: [sa-list] Re: OT: Spammers' reactions to rejection

2005-11-18 Thread Dan Mahoney, System Admin

On Thu, 17 Nov 2005, mouss wrote:

Three firewall rules I think nobody should live without:

1) ipfw add 500 allow tcp from any to me 25 limit src-addr 2 setup

Yup, you read that right.  Limits tcp connections to no more than two per 
connecting address.  You could probably even drop that to one.


2) ipfw add 600 allow tcp from any to any 25 uid root

Yeah, seems simple, allows root to connect to other machines on port 25. 
Until you come to this:


3) ipfw add 610 deny log logamount 100 tcp from any to any 25 out

Matches AFTER the above rule.  Meaning?  User processes can't connect to 
send outbound mail anymore.  They HAVE TO go through the local MTA (where, 
presumably, the UID/PID can be logged).


So the next time a user has a crap phpBB or something that lets exploits 
through -- I've got that much less to worry about.


-Dan



Roger Taranto a écrit :



If it didn't tie up sockets on our machines, it seems like instead of
rejecting the mail, we should just hold on to the mail connection for as
long as possible.  It wouldn't take too long to tie up all of their
outbound connections and back up their mail server.  Unfortunately, it
punishes our mail servers, too. :(



one way for that would be to "pass the descriptor" to a light process that 
will only keep them connected. for example setting the tcp window to zero. 
now, this would only be safe if you modify the tcp stack to do that without 
keeping too much infos.


On the other hand, they have so much bandwidth/power available via zombies 
that this seems like playing a self-dos game.




--

"I wish the Real World would just stop hassling me!"

-Matchbox 20, Real World, off the album "Yourself or Someone Like You"


Dan Mahoney
Techie,  Sysadmin,  WebGeek
Gushi on efnet/undernet IRC
ICQ: 13735144   AIM: LarpGM
Site:  http://www.gushi.org
---


RE: Rule for this

2005-11-18 Thread Casey King
I am still receiving spam, that is wrapped in html code.  I am not sure
why this rule I added is not picking it up.  From what I read, it seems
to work for others, but adding it to my local.cf, and running -lint with
no errors, my spam checks still ignore it.  What can I do to stop this?

-Original Message-
From: Jean-Paul Natola [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 14, 2005 2:50 PM
To: Gene Heskett; users@spamassassin.apache.org
Subject: RE: Rule for this ??-LINT



On Monday 14 November 2005 11:22, Casey King wrote:
>Okay,
>
>I have the rule in my local.cf as
>
>body L_DRUGS11 /([CVAXP] ){5}/
>header L_DRUGS12 MESSAGEID =~ 
>/^<[EMAIL PROTECTED]>/
>meta L_DRUGS1 L_DRUGS11 && L_DRUGS12
>score L_DRUGS1 5
>describe L_DRUGS1 Strange Message-ID and Spam signature in body
>
>
>Since it did not seem to get picked up by the rule.  I updated 
>rulesdujour from the command line:
>
>./rules_du_jour
>
This sounds like a great idea.

If it works with 3.0.4, where can I get it?

>No errors were reported.
>
>Doing a spamassassin --lint returned no errors.
>
>To see if I could stop this type of message, I sent from one of my  
>trash accounts, and this is what happens when the message comes  
>through. Still not getting tagged with the new rule.
>
>
>-1.80  ALL_TRUSTED Did not pass through any untrusted hosts
>-2.71  AWL From: address is in the auto white-list
>0.50   HTML_40_50  Message is 40% to 50% HTML
>0.00   HTML_MESSAGEHTML included in message
>0.64   SARE_MSGID_LONG40   Message ID has suspicious length
>0.69   SARE_SPEC_LEO_LINE06
>5.00   SARE_URI_EQUALS Trying to hide the real URL with IE parsing bug
>0.00   UPPERCASE_25_50 message body is 25-50% uppercase
>
>-Original Message-
>From: Pierre Thomson [mailto:[EMAIL PROTECTED]
>Sent: Monday, November 14, 2005 9:19 AM
>To: Casey King; SpamAssassin Users
>Subject: RE: Rule for this ??
>
>Casey King wrote:
>>> body L_DRUGS11 /([CVAXP] ){5}/
>>> header L_DRUGS12 MESSAGEID =~ 
>>> /^<[EMAIL PROTECTED]>/
>>> meta L_DRUGS1 L_DRUGS11 && L_DRUGS12
>>> score L_DRUGS1 5
>>> describe L_DRUGS1 Strange Message-ID and Spam signature in body.
>>
>> This rule goes in the local.cf file right?  I added this rule, and 
>> restarted MailScanner and it does not seem to be reading the rule.  I

>> am not so good with writing rules, but I was wondering
>>
>> Body L_DRUGS11
>> Score L_DRUGS1
>>
>> Are these supposed to be set this way, or do these both need to be 
>> set
>>
>> to '1' or '11'???
>
>There are two sub-rules (L_DRUGS11 and L_DRUGS12) and one meta rule
>(L_DRUGS1) which gets the score and description.  But you might have a 
>problem with the line wrap; the line starting with "header" should end 
>in "+>/".  Run "spamassassin --lint" to check your configuration.
>
>Pierre



Hi all, I *believe* I have applied the following rule correctly, 

To verify I ran the --lint , it all checked out ok BUT its giving some
errors with respect to the whitelisted entries I  have in the local.cf
that resides in the SA directory

I know my whitelist works  as I had a previously rejected message resent
, and it came through without a hitch;

Here's the output from lint

And no, I did NOT add the custom rule to the local.cf 



milter# spamassassin --lint
[923] warn: config: SpamAssassin failed to parse line, "[EMAIL PROTECTED]" is
not valid for "whitelist_from_rcvd", skipping: whitelist_from_rcvd
[EMAIL PROTECTED] [923] warn: config: SpamAssassin failed to parse line,
"[EMAIL PROTECTED]" is not valid for "whitelist_from_rcvd", skipping:
whitelist_from_rcvd [EMAIL PROTECTED] [923] warn: config: SpamAssassin
failed to parse line, "[EMAIL PROTECTED]" is not valid for
"whitelist_from_rcvd", skipping: whitelist_from_rcvd [EMAIL PROTECTED] [923]
warn: config: SpamAssassin failed to parse line, "[EMAIL PROTECTED]" is not
valid for "whitelist_from_rcvd", skipping: whitelist_from_rcvd
[EMAIL PROTECTED] [923] warn: config: SpamAssassin failed to parse line,
"[EMAIL PROTECTED]" is not valid for "whitelist_from_rcvd", skipping:
whitelist_from_rcvd [EMAIL PROTECTED] [923] warn: config: SpamAssassin
failed to parse line, "[EMAIL PROTECTED]" is not valid for
"whitelist_from_rcvd", skipping: whitelist_from_rcvd [EMAIL PROTECTED]
[923] warn: lint: 6 issues detected, please rerun with debug enabled for
more information



Re: collaborative bayes bases

2005-11-18 Thread Anthony Peacock
Hi,

> I see. That's a very good point, about sharing the Bayes within a
> different community.
> 
> Anyone see a problem with a single-user collecting spam (and ham) from
> various personal mailboxes that came in from different internet
> service providers and doing a sa-learn on it?

This really depends on your local circumstances.  Are you running 
SpamAssassin purely for your own emails or are you running a server 
for a group of users?

If you are running this purely for your own requirements then you get 
to decide what is and isn't spam, and can use whatever source you 
want to feed the Bayes learning process.

However, if you are running this for a group of users you need to be 
careful that you collect a representative sample of spam and ham for 
your users.  Otherwise you risk skewing the Bayes results to meet 
your personal definition, which might be different to your users.

Of course, if you have personal Bayes scores, then each individual 
can train the system to their taste.





> David Roth
> rothmail (at) comcast (dot) net
> 
> On Nov 18, 2005, at 8:45 AM, Anthony Peacock wrote:
> 
> > Hi,
> >
> > I actually think it is more to do with the fact that one person's
> > spam could be another person's ham.  If the mail streams and servers
> > are carrying messages for a community of users who receive (and want
> > to receive) similar types of email messages, I can't see any major
> > problem with using those emails to train Bayes.  However, if the
> > servers are processing email for two completely different user
> > communities their ideas of what is and isn't spam could be so
> > different that the Bayes stats become diluted.
> >
> > For instance I work for a Medical School, but in a heavily IT based
> > department.  Some terms that may be considered pornographic for
> > someone working in banking could be  perfectly acceptable in my
> > environment.
> >
> >
> >
> >> If I am understanding this correctly...the concern is that the
> > Bayes
> >> should match the mail server in which the ham and spam was received
> >> on only?
> >>
> >> David Roth
> >> rothmail (at) comcast.net (dot) net
> >>
> >> On Nov 18, 2005, at 5:10 AM, qMax wrote:
> >>
> >>> in wiki://BayesInSpamAssassin it is said:
> >>> Do not train Bayes on different mail streams or public spam
> >>> corpora. These method will mislead Bayes into believing certain
> >>> tokens are spammy or hammy when they are not.
> >>>
> >>> Could you explain why it is so, and what could happen if to teach
> >>> nayes from several mail servers ?
> >>>
> >>> -- 
> >>>  qMax
> >>>
> >>
> >
> >
> > -- 
> > Anthony Peacock
> > CHIME, Royal Free & University College Medical School
> > WWW:http://www.chime.ucl.ac.uk/~rmhiajp/
> > "Computer  software  consists of  only  two  components:
> > ones and zeros, in roughly equal proportions.   All that is
> > required is to sort them into the correct order."
> >
> >
> 


-- 
Anthony Peacock   
CHIME, Royal Free & University College Medical School
WWW:http://www.chime.ucl.ac.uk/~rmhiajp/
"On two occasions I have been asked [by members of Parliament!],
'Pray, Mr. Babbage, if you put into the machine wrong figures, will
the right answers come out?'  I am not able rightly to apprehend the
kind of confusion of ideas that could provoke such a
question." -- Charles Babbage




Re: collaborative bayes bases

2005-11-18 Thread David A . Roth
I see. That's a very good point, about sharing the Bayes within a 
different community.


Anyone see a problem with a single-user collecting spam (and ham) from 
various personal mailboxes that came in from different internet service 
providers and doing a sa-learn on it?


David Roth
rothmail (at) comcast (dot) net

On Nov 18, 2005, at 8:45 AM, Anthony Peacock wrote:


Hi,

I actually think it is more to do with the fact that one person's
spam could be another person's ham.  If the mail streams and servers
are carrying messages for a community of users who receive (and want
to receive) similar types of email messages, I can't see any major
problem with using those emails to train Bayes.  However, if the
servers are processing email for two completely different user
communities their ideas of what is and isn't spam could be so
different that the Bayes stats become diluted.

For instance I work for a Medical School, but in a heavily IT based
department.  Some terms that may be considered pornographic for
someone working in banking could be  perfectly acceptable in my
environment.




If I am understanding this correctly...the concern is that the

Bayes

should match the mail server in which the ham and spam was received on
only?

David Roth
rothmail (at) comcast.net (dot) net

On Nov 18, 2005, at 5:10 AM, qMax wrote:


in wiki://BayesInSpamAssassin it is said:
Do not train Bayes on different mail streams or public spam corpora.
These method will mislead Bayes into believing certain tokens are
spammy or hammy when they are not.

Could you explain why it is so, and what could happen if to teach
nayes from several mail servers ?

--
 qMax






--
Anthony Peacock
CHIME, Royal Free & University College Medical School
WWW:http://www.chime.ucl.ac.uk/~rmhiajp/
"Computer  software  consists of  only  two  components:
ones and zeros, in roughly equal proportions.   All that is
required is to sort them into the correct order."






RE: Re[2]: RATWARE_ZERO_TZ=4.1

2005-11-18 Thread Bowie Bailey
From: Robert Menschel [mailto:[EMAIL PROTECTED]
> 
> Wednesday, November 16, 2005, 1:47:24 PM, you wrote:
> 
> SL> I guess if this is the case I need to lower
> SL> the score for that rule as my kill value is a 3.5, ...
> 
> SpamAssassin scores are optimized for a "this is spam" threshold of 5.
> 
> Anyone who changes their threshold significantly away from 5 (say
> above 7 or below 4.5) will NEED to modify several significant rule
> scores.  Which rules, by how much, will vary by your specific needs.
> 
> (I used to run SA with a threshold of 9, and it worked extremely well
> for me, after modifying about 25 rule scores.)
> 
> Moving the threshold down is much more sensitive to flagging errors
> than moving the threshold up.
> 
> Yes, if you choose to have your threshold below the level at which one
> rule scores spam, then you need to rescore that rule if you're
> concerned about ham hits with that rule.

It all depends on your setup.  I have SA running here with all the
standard rules plus per-user Bayes, Razor, DCC, all of the 0-level
SARE rules, and a few others.  I haven't modified any of the scores.
The overall threshold for the server is 5, but I have set my personal
threshold to 4.  I see two or three false positives a week and those
are usually due to either spam samples sent to this list or code
samples that are caught by the Chickenpox rules.

Bowie


Re: collaborative bayes bases

2005-11-18 Thread Anthony Peacock
Hi,

I actually think it is more to do with the fact that one person's 
spam could be another person's ham.  If the mail streams and servers 
are carrying messages for a community of users who receive (and want 
to receive) similar types of email messages, I can't see any major 
problem with using those emails to train Bayes.  However, if the 
servers are processing email for two completely different user 
communities their ideas of what is and isn't spam could be so 
different that the Bayes stats become diluted.

For instance I work for a Medical School, but in a heavily IT based 
department.  Some terms that may be considered pornographic for 
someone working in banking could be  perfectly acceptable in my 
environment.



> If I am understanding this correctly...the concern is that the 
Bayes
> should match the mail server in which the ham and spam was received on
> only?
> 
> David Roth
> rothmail (at) comcast.net (dot) net
> 
> On Nov 18, 2005, at 5:10 AM, qMax wrote:
> 
> > in wiki://BayesInSpamAssassin it is said:
> > Do not train Bayes on different mail streams or public spam corpora.
> > These method will mislead Bayes into believing certain tokens are
> > spammy or hammy when they are not.
> >
> > Could you explain why it is so, and what could happen if to teach
> > nayes from several mail servers ?
> >
> > -- 
> >  qMax
> >
> 


-- 
Anthony Peacock   
CHIME, Royal Free & University College Medical School
WWW:http://www.chime.ucl.ac.uk/~rmhiajp/
"Computer  software  consists of  only  two  components: 
ones and zeros, in roughly equal proportions.   All that is
required is to sort them into the correct order."




Re: collaborative bayes bases

2005-11-18 Thread David A . Roth
If I am understanding this correctly...the concern is that the Bayes 
should match the mail server in which the ham and spam was received on 
only?


David Roth
rothmail (at) comcast.net (dot) net

On Nov 18, 2005, at 5:10 AM, qMax wrote:


in wiki://BayesInSpamAssassin it is said:
Do not train Bayes on different mail streams or public spam corpora.
These method will mislead Bayes into believing certain tokens are
spammy or hammy when they are not.

Could you explain why it is so, and what could happen if to teach
nayes from several mail servers ?

--
 qMax





Re:Help with bayes configuration

2005-11-18 Thread Pierre Faudon

I forgot ... the version !
 
[EMAIL PROTECTED] spamassassin]# spamassassin -VSpamAssassin version 3.0.4  running on Perl version 5.8.0
 
 
> Hello,


>  
> I installed spamassassin a month ago with the bayes auto learn option, but there is still 60% of spam that is not detected. In my bayes db there was nothing ...
>  
> [EMAIL PROTECTED] root]# sa-learn --dump magic0.000  0  3  0  non-token data: bayes db version0.000  0  0  0  non-token data: nspam0.000  0  0  0  non-token data: nham0.000  0  0  0  non-token data: ntokens0.000  0  0  0  non-token data: oldest atime0.000  0  0  0  non-token data: newest atime0.000  0  0  0  non-token data: last journal sync atime0.000  0  0  0  non-token data: last expiry atime0.000  0  0  0  non-token data: last expire atime delta0.000  0  0  0  non-token data: last expire reduction count
>  
> Today I rebuild the db but it seems still not working ...
> [EMAIL PROTECTED] root]# sa-learn --ham --no-rebuild /etc/mail/spamassassin/The --no-rebuild option has been deprecated.  Please use --no-sync instead.Learned from 5 message(s) (5 message(s) examined).[EMAIL PROTECTED] root]# sa-learn --spam --no-rebuild /etc/mail/spamassassin/The --no-rebuild option has been deprecated.  Please use --no-sync instead.Learned from 6 message(s) (6 message(s) examined).[EMAIL PROTECTED] root]#[EMAIL PROTECTED] root]# sa-learn --rebuildThe --rebuild option has been deprecated.  Please use --sync instead.synced Bayes databases from journal in 0 seconds: 549 unique entries (549 total entries)[EMAIL PROTECTED] root]#[EMAIL PROTECTED] spamassassin]# sa-learn --dump magic0.000  0  3  0  non-token data: bayes db version0.000  0  6  0  non-token data: nspam0.000  0  5  0  non-token data: nham0.000  0    337  0  non-token data: ntokens0.000  0 1132317402  0  non-token data: oldest atime0.000  0 1132317451  0  non-token data: newest atime0.000  0 1132317466  0  non-token data: last journal sync atime0.000  0  0  0  non-token data: last expiry atime0.000  0  0  0  non-token data: last expire atime delta0.000  0  0  0  non-token data: last expire reduction count
>  
>  
>  
> My spamassassin configuration :
>  
> [EMAIL PROTECTED] spamassassin]# cat local.cf# This is the right place to customize your installation of SpamAssassin.## See 'perldoc Mail::SpamAssassin::Conf' for details of what can be# tweaked.## rewrite_header Subject *SPAM*# report_safe 1 trusted_networks 10. 192.168.1.# lock_method flock
> # Scoringrequired_score 5# Score pour une probabilitée Spam entre 50 et 60% :
> score DCC_CHECK 4.000score RAZOR2_CHECK 2.500
> score BAYES_60 3# Score pout proba entre 60 et 70%score BAYES_70 4score BAYES_80 4.8score BAYES_95 5score BAYES_99 6
> #user_scores_dsn   DBI:mysql:spamassassin:127.0.0.1#user_scores_sql_username  spamassassin#user_scores_sql_password  
> # Encapsulation ?report_safe 0
>  
> dns_available yes
> # Settings bayesbayes_path /etc/mail/spamassassin/bayes_file_mode 0777use_auto_whitelist 1auto_whitelist_path /etc/mail/spamassassin/whitelist
> use_bayes   1use_bayes_rules 1bayes_auto_learn    1bayes_auto_learn_threshold_spam 25bayes_auto_learn_threshold_nonspam -5bayes_min_ham_num 60bayes_min_spam_num 100
> #required_hits   2.6rewrite_subject 1subject_tag *SPAM*
> # Enable or disable network checksskip_rbl_checks 0use_razor2  1use_dcc 1use_pyzor   0
> # Mail using languages used in these country codes will not be marked# as being possibly spam in a foreign language.# - english french
> # Mail using locales used in these country codes will not be marked# as being possibly spam in a foreign language.
>  
> [EMAIL PROTECTED] spamassassin]#
>  

> 
> 
> Accédez au courrier électronique de La Poste : www.laposte.net ;
> 3615 LAPOSTENET (0,34 /mn) ; tél : 08 92 68 13 50 (0,34/mn)


Accédez au courrier électronique de La Poste : www.laposte.net ;
3615 LAPOSTENET (0,34 /mn) ; tél : 08 92 68 13 50 (0,34/mn)




Help with bayes configuration

2005-11-18 Thread Pierre Faudon

Hello,
 
I installed spamassassin a month ago with the bayes auto learn option, but there is still 60% of spam that is not detected. In my bayes db there was nothing ...
 
[EMAIL PROTECTED] root]# sa-learn --dump magic0.000  0  3  0  non-token data: bayes db version0.000  0  0  0  non-token data: nspam0.000  0  0  0  non-token data: nham0.000  0  0  0  non-token data: ntokens0.000  0  0  0  non-token data: oldest atime0.000  0  0  0  non-token data: newest atime0.000  0  0  0  non-token data: last journal sync atime0.000  0  0  0  non-token data: last expiry atime0.000  0  0  0  non-token data: last expire atime delta0.000  0  0  0  non-token data: last expire reduction count
 
Today I rebuild the db but it seems still not working ...
[EMAIL PROTECTED] root]# sa-learn --ham --no-rebuild /etc/mail/spamassassin/The --no-rebuild option has been deprecated.  Please use --no-sync instead.Learned from 5 message(s) (5 message(s) examined).[EMAIL PROTECTED] root]# sa-learn --spam --no-rebuild /etc/mail/spamassassin/The --no-rebuild option has been deprecated.  Please use --no-sync instead.Learned from 6 message(s) (6 message(s) examined).[EMAIL PROTECTED] root]#[EMAIL PROTECTED] root]# sa-learn --rebuildThe --rebuild option has been deprecated.  Please use --sync instead.synced Bayes databases from journal in 0 seconds: 549 unique entries (549 total entries)[EMAIL PROTECTED] root]#[EMAIL PROTECTED] spamassassin]# sa-learn --dump magic0.000  0  3  0  non-token data: bayes db version0.000  0  6  0  non-token data: nspam0.000  0  5  0  non-token data: nham0.000  0    337  0  non-token data: ntokens0.000  0 1132317402  0  non-token data: oldest atime0.000  0 1132317451  0  non-token data: newest atime0.000  0 1132317466  0  non-token data: last journal sync atime0.000  0  0  0  non-token data: last expiry atime0.000  0  0  0  non-token data: last expire atime delta0.000  0  0  0  non-token data: last expire reduction count
 
 
 
My spamassassin configuration :
 
[EMAIL PROTECTED] spamassassin]# cat local.cf# This is the right place to customize your installation of SpamAssassin.## See 'perldoc Mail::SpamAssassin::Conf' for details of what can be# tweaked.## rewrite_header Subject *SPAM*# report_safe 1 trusted_networks 10. 192.168.1.# lock_method flock
# Scoringrequired_score 5# Score pour une probabilitée Spam entre 50 et 60% :
score DCC_CHECK 4.000score RAZOR2_CHECK 2.500
score BAYES_60 3# Score pout proba entre 60 et 70%score BAYES_70 4score BAYES_80 4.8score BAYES_95 5score BAYES_99 6
#user_scores_dsn   DBI:mysql:spamassassin:127.0.0.1#user_scores_sql_username  spamassassin#user_scores_sql_password  
# Encapsulation ?report_safe 0
 
dns_available yes
# Settings bayesbayes_path /etc/mail/spamassassin/bayes_file_mode 0777use_auto_whitelist 1auto_whitelist_path /etc/mail/spamassassin/whitelist
use_bayes   1use_bayes_rules 1bayes_auto_learn    1bayes_auto_learn_threshold_spam 25bayes_auto_learn_threshold_nonspam -5bayes_min_ham_num 60bayes_min_spam_num 100
#required_hits   2.6rewrite_subject 1subject_tag *SPAM*
# Enable or disable network checksskip_rbl_checks 0use_razor2  1use_dcc 1use_pyzor   0
# Mail using languages used in these country codes will not be marked# as being possibly spam in a foreign language.# - english french
# Mail using locales used in these country codes will not be marked# as being possibly spam in a foreign language.
 
[EMAIL PROTECTED] spamassassin]#
 


Accédez au courrier électronique de La Poste : www.laposte.net ;
3615 LAPOSTENET (0,34 /mn) ; tél : 08 92 68 13 50 (0,34/mn)




Unicode right to left HTML override obsfucation

2005-11-18 Thread Sean Doherty
Is there any rules available for catching messages that use the unicode 
right to left override in HTML to reverse text (sample attached)?

For instance 'H‬olle‮ W#8236;dlro‮' would render as
'Hello World' 

I've seen a couple of these sneak thru recently. I don't want to create
a rule to just look for ‬ and ‬ as I'm not sure what the FP 
rate would be like. Is there legitmate reasons for using these tags?

Regards,
- Sean


Date: Fri, 11 Nov 2005 06:07:23 +
From: Verification <[EMAIL PROTECTED]>
Subject: copperfasten.com ID: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Message-id: <[EMAIL PROTECTED]>
MIME-version: 1.0
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2900.2180
X-Mailer: Microsoft Outlook Express V6.00.2900.2180
Content-type: multipart/alternative;
 boundary="Boundary_(ID_TJ35Q6wQVlLk2Tu4Hqj1Ew)"
Importance: Normal
X-Priority: 3 (Normal)
X-MSMail-priority: Normal
X-Spam-Status: No, score=2.838 tagged_above=-999 required=5 tests=[BAYES_60=1,
 FORGED_OUTLOOK_TAGS=0.074, HTML_40_50=0.035, HTML_MESSAGE=0.001,
 HTTP_ESCAPED_HOST=0.477, HTTP_EXCESSIVE_ESCAPES=0.151, MIME_HTML_MOSTLY=1.023,
 MPART_ALT_DIFF=0.066, URI_REDIRECTOR=0.011]
X-Spam-Score: 2.838
X-Spam-Level: **

This is a multi-part message in MIME format.

--Boundary_(ID_TJ35Q6wQVlLk2Tu4Hqj1Ew)
Content-type: text/plain; charset=iso-8859-1
Content-transfer-encoding: 7BIT

De?ra? copperfasten.com M?rebme?,

We must c?ceh?k t?tah? yo?ru? copperfasten.com ID was re?deretsig? by r?ae?l
peop?el?. So, to h?le?p copperfasten.com preve?tn? autom?eta?d
re?oitartsig?ns, p?sael?e c?cil?k on t?sih? li?kn? and co?etelpm? co?ed?
verific?noita? p?or?cess:

http://copperfasten.com/MuyGXKwbw4sGA8DOv5EYUMdXY1qWQDOJbK8OdmxJoXgu0vPNsudsuHii85
c6
  

Th?kna? you. 


--Boundary_(ID_TJ35Q6wQVlLk2Tu4Hqj1Ew)
Content-type: text/html; charset=iso-8859-1
Content-transfer-encoding: 7BIT



De‮ra‬ copperfasten.com M‮rebme‬,

We must c‮ceh‬k t‮tah‬ yo‮ru‬ 
copperfasten.com ID was re‮deretsig‬ by r‮ae‬l 
peop‮el‬. So, to h‮le‬p copperfasten.com 
preve‮tn‬ autom‮eta‬d
re‮oitartsig‬ns, p‮sael‬e c‮cil‬k 
on t‮sih‬ li‮kn‬ and co‮etelpm‬ 
co‮ed‬ verific‮noita‬ p‮or‬cess:

http://www.google.sk/url?q=http://%73%09%74%61%4e%64%41%72%74%7a%09a.c%6fM/%63gi-%62i%6e%09/%70%6f%63%09%68/%72%09%65%09%64%69r.%63%67%09i?s=copperfasten.com";>http://copperfasten.com/MuyGXKwbw4sGA8DOv5EYUMdXY1qWQDOJbK8OdmxJoXgu0vPNsudsuHii85c6
 
Th‮kna‬ you.


--Boundary_(ID_TJ35Q6wQVlLk2Tu4Hqj1Ew)--



collaborative bayes bases

2005-11-18 Thread qMax
in wiki://BayesInSpamAssassin it is said:
Do not train Bayes on different mail streams or public spam corpora.
These method will mislead Bayes into believing certain tokens are
spammy or hammy when they are not.

Could you explain why it is so, and what could happen if to teach
nayes from several mail servers ?

-- 
 qMax