subject:"Re\: Is Bayes Really Necessary\?"

RE: Is Bayes Really Necessary?

2005-06-06 Thread David B Funk

On Mon, 6 Jun 2005 [EMAIL PROTECTED] wrote:

> David Brodbeck wrote:
> > Loren Wilton wrote:
> >> You'ld think that there should be some way to do a reverse DNS to
> >> determine from an ip the domains that exist on that ip.  I suspect
> >> though that the whole internet fabric is designed the other way
> >> around, and that this information is probably something that no
> >> single registrar would know.
> >
> > In theory, a reverse lookup could give you all the hostnames
> > associated with that IP.  In reality, almost no one actually sets up
> > multiple reverse DNS records for such sites.  So yes, it's difficult.
>
> Maybe a "reverse SPF" record is called for...
>
> _spf.0.0.10.in-addr.arp TXT "example.org, some.example.com"...
>

Two-fold problem with either of those solutions:

1) It would depend upon the spammer actually registering and keeping
   accurate that kind of data. (Do you really think that they'll want
   to give the farm away ;).
2) The size of DNS answers would quickly get large enough to cause
   technical problems. DNS normally uses UDP packets to keep overhead
   low (one small packet for query, another for the response). As soon
   as you get more than about 500~1000 bytes of data in an answer you'll
   have to switch to TCP if you want to get the full data. (A lot more
   load on the DNS servers and more network overhead. ;(

-- 
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

RE: Is Bayes Really Necessary?

2005-06-06 Thread Matthew.van.Eerde

David Brodbeck wrote:
> Loren Wilton wrote:
>> You'ld think that there should be some way to do a reverse DNS to
>> determine from an ip the domains that exist on that ip.  I suspect
>> though that the whole internet fabric is designed the other way
>> around, and that this information is probably something that no
>> single registrar would know. 
> 
> In theory, a reverse lookup could give you all the hostnames
> associated with that IP.  In reality, almost no one actually sets up
> multiple reverse DNS records for such sites.  So yes, it's difficult.

Maybe a "reverse SPF" record is called for...

_spf.0.0.10.in-addr.arp TXT "example.org, some.example.com"...

-- 
Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902
Hispanic Business Inc./HireDiversity.com Software Engineer
perl -e"map{y/a-z/l-za-k/;print}shift" "Jjhi pcdiwtg Ptga wprztg,"

Re: Is Bayes Really Necessary?

2005-06-06 Thread David Brodbeck


Loren Wilton wrote:

You'ld think that there should be some way to do a reverse DNS to determine
from an ip the domains that exist on that ip.  I suspect though that the
whole internet fabric is designed the other way around, and that this
information is probably something that no single registrar would know.


In theory, a reverse lookup could give you all the hostnames associated 
with that IP.  In reality, almost no one actually sets up multiple 
reverse DNS records for such sites.  So yes, it's difficult.

Re: Is Bayes Really Necessary?

2005-06-04 Thread Loren Wilton

> How exactly do we determine what other sites are hosted on a
> given server, i.e., sites that don't appear in spams?  IOW
> how do you know there's "one internal site"?

You'ld think that there should be some way to do a reverse DNS to determine
from an ip the domains that exist on that ip.  I suspect though that the
whole internet fabric is designed the other way around, and that this
information is probably something that no single registrar would know.

In theory I'd think that one could process zone files to determine what
existed on any given ip that was advertized and accessible by name.
However, getting one's hands on the zone files in the first place...

Loren

Re: Is Bayes Really Necessary?

2005-06-04 Thread Jeff Chan

On Saturday, June 4, 2005, 6:20:11 AM, jdow jdow wrote:
> One tiny quibble. For each machine blocked there is perhaps one whole
> internal site that is blocked as well. But it means that site is
> throwing spam out to the universe and the company doing it or the
> individual doing it should stop the practice or take back ownership
> of their machine. THEY might consider themselves "innocent victims."
> But it's the only way if they have one bad egg in their company or
> an infected computer. Either way they really have no solid claim
> on any innocence they may profess.

How exactly do we determine what other sites are hosted on a
given server, i.e., sites that don't appear in spams?  IOW
how do you know there's "one internal site"?

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/

Re: Is Bayes Really Necessary?

2005-06-04 Thread List Mail User

>> >[previous stuff snipped]
>> >Loren
>>
>> Loren is correct. And Jeff and I have had this conversation many times.
>Jeff
>> would rather not risk the FPs by doing it. I can see his point. But I
>agree
>> with Loren that we have IPs that are pure spam.
>
>One tiny quibble. For each machine blocked there is perhaps one whole
>internal site that is blocked as well. But it means that site is
>throwing spam out to the universe and the company doing it or the
>individual doing it should stop the practice or take back ownership
>of their machine. THEY might consider themselves "innocent victims."
>But it's the only way if they have one bad egg in their company or
>an infected computer. Either way they really have no solid claim
>on any innocence they may profess.
>
>{^_^}
>

This is just one difference between SURBLs and some other lists
(e.g. the SBL).  The people *do* in some cases have a valid claim of
innocence - even most of the worst spams hosting offender have at least
a few legitimate customers, who did not perform adequate due-diligence
before signing up/committing to using a "spammer" or "spam friendly"
service; And unfortunately, for cases like this, ignorance is a valid
defence (though a good lawyer would argue that for medium to large
businesses, such behavior is negligent).

In at least a few cases in with I have been personally involved,
contacting a large company will cause them to move away and fast.  For
example, Ebuyer (from whom I semi-regulaly purchase equipment), contracted
with a Brazilian porno outfit to host and mail on their behalf - one
telephone call and within 5 minutes I was connected to a VP - 36 hours
later they were elsewhere, and now operate their own site from the home
office in England.  (They were blacklisted by IP on the SBL, SPEWS and
quite a few other lists for two days and had no idea until it was explained
to them and a non-technical corporate officer was led through checking
things like openrbl.org, Spamhaus, etc.)

Bad business practices do not always translate to guilt; Either
(IANAL) legally or IMNSHO morally;  Now, if they had stayed there after
being told and the situation explained, they would have lost at least
one customer, who also probably would have made sure they got blacklisted
in many more places:)  Instead, they convinced me that the were a well
meaning company, who made a mistake and acted very quickly to remedy
the situation.  For all those similar companies, with whom I do not do
business or even recognize their names, I'm sure that many are in the
same boat.


Paul Shupak
[EMAIL PROTECTED]

Re: Is Bayes Really Necessary?

2005-06-04 Thread jdow

From: "Chris Santerre" <[EMAIL PROTECTED]>
> >-Original Message-
> >From: Loren Wilton [mailto:[EMAIL PROTECTED]
> >
> >>> If that statement is true, perhaps the surbl lists could
> >automatically
> >>> include the dotquads for hosts that are known to be
> >pure spam sources and
> >>> not mixed systems. Then the client could get the ip for a
> >suspect hostname
> >>> and see if it matched a known spam dotquad.
> >
> >> I'd swear this came up before.  The one (slight?) problem
> >with this tactic is
> >> that you can have too many FPs if a spammer targets a legit hosting
> >> operation.
> >
> >I think there was a failure to read all the words in my
> >original post.
> >
> >I quite specifically suggested that listing ips should be
> >limited to hosts that are known to be pure spam
> >sources.  If the host is KNOWN to be purely spam
> >(ie: it is owned and run by the spammer), I fail completely to
> >see how matching on the known IP for that host can either
> >target or hit innocent bystanders; or indeed bystanders of any sort.
> >
> >It might be argued that making the determination that a host
> >is a pure spam host could be hard.  This may well be true.
> >But despite that, I'd bet that Jeff or Chris could probably
> >list off a dozen or hundred or so hosts that they know quite
> >well serve nothing except spammer domains.  I fail completely
> >to see how matching on the ip for these known hosts can do
> >anything but good, assuming the ip lookup is limited to the
> >resolved ips of urls found in the spam.
> >
> >Loren
>
> Loren is correct. And Jeff and I have had this conversation many times.
Jeff
> would rather not risk the FPs by doing it. I can see his point. But I
agree
> with Loren that we have IPs that are pure spam.

One tiny quibble. For each machine blocked there is perhaps one whole
internal site that is blocked as well. But it means that site is
throwing spam out to the universe and the company doing it or the
individual doing it should stop the practice or take back ownership
of their machine. THEY might consider themselves "innocent victims."
But it's the only way if they have one bad egg in their company or
an infected computer. Either way they really have no solid claim
on any innocence they may profess.

{^_^}

Re: Is Bayes Really Necessary?

2005-06-03 Thread Jeff Chan

On Friday, June 3, 2005, 3:47:05 AM, Loren Wilton wrote:
>>> If that statement is true, perhaps the surbl lists could automatically
>>> include the dotquads for hosts that are known to be pure spam 
>>> sources and
>>> not mixed systems.  Then the client could get the ip for a suspect hostname
>>> and see if it matched a known spam dotquad.

>> I'd swear this came up before.  The one (slight?) problem with this tactic 
>> is 
>> that you can have too many FPs if a spammer targets a legit hosting 
>> operation.

> I think there was a failure to read all the words in my original post.  

> I quite specifically suggested that listing ips should be limited to hosts 
> that are known to be pure spam sources.  If the host is KNOWN 
> to be purely spam (ie: it is owned and run
> by the spammer), I fail completely to see how matching on the known IP for 
> that host can either target or hit innocent bystanders; or indeed bystanders 
> of any sort.

> It might be argued that making the determination that a host is a pure spam 
> host could be hard.  This may well be true.  But despite that, I'd bet that 
> Jeff or Chris could probably list off a dozen
> or hundred or so hosts that they know quite well serve nothing except spammer 
> domains.  I fail completely to see how matching on the ip for these known 
> hosts can do anything but good, assuming the
> ip lookup is limited to the resolved ips of urls found in the spam.

> Loren

It's possible to say some IPs are used in a lot of spam.  Is it
possible to say those IPs are only used in spam?  Sure... if we
were omniscient.  ;-)  Otherwise we don't know for certain
whether there are innocent bystanders there.

It's probably safer to list the URIs that are actually seen in
spams than to blacklist IPs or networks.  The question then
becomes how to get them listed quickly, and if you see the link I
provided you will note that we have a strategy for that which we
will be trying RSN:

  http://www.surbl.org/faq.html#numbered

Cheers,

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/

Re: Is Bayes Really Necessary?

2005-06-03 Thread Alex Broens


List Mail User wrote:


And adding a URI rule for the completewhois list (basically the same
function as the no longer existing ipwhois.rfc-ignorant.org list) will hit
yet more name servers and spammer IPs with slightly fewer FPs (no issue with
escalations).  The list is: combined-HIB.dnsiplists.completewhois.com


This works miracles except that their uptime and speed is not always 
top. If we could get more mirrors going it would be a valuable addition.


Alex

Re: Is Bayes Really Necessary?

2005-06-03 Thread List Mail User

>...
>
>On Friday, June 3, 2005, 12:33:26 AM, Duncan Hill wrote:
>> On Friday 03 June 2005 08:10, Loren Wilton typed:
>>> It was basically "the spammer makes a zillion new domains, and they all
>>> take time to get into SURBL, so some spam gets through.  But they all point
>>> to the same dotted quad, and I can match on that lookup".
>>>
>>> If that statement is true, perhaps the surbl lists could automatically
>>> include the dotquads for hosts that are known to be pure spam sources and
>>> not mixed systems.  Then the client could get the ip for a suspect hostname
>>> and see if it matched a known spam dotquad.
>
>> I'd swear this came up before.  The one (slight?) problem with this tactic 
>> is 
>> that you can have too many FPs if a spammer targets a legit hosting 
>> operation.
>
>Exactly.  Listing resolved IPs magnifies the problems with false
>positives, joe jobs and collateral damage.  Please see:
>
>  http://www.surbl.org/faq.html#numbered
>
>"Are there plans to offer an RBL list with the domain names
>resolved into IP addresses?"
>
>> Postifx does have a neat restriction to reject based on the IP address of 
>> the 
>> name server.  You run the same risk, but I've noticed that the pr1ces, al1v3 
>> and so on spammer has used the same NS servers for each one
>
>Using sbl.spamhaus.org with uridnsbl in SA3 does something
>similar.  SBL has many spammer nameservers listed in it and
>uridnsbl checks a URI's nameservers against SBL.  It tends
>to detect many spamy domains that way (and occasionally a few
>relatively innocent bystanders).
>
>Jeff C.
>-- 
>Jeff Chan
>mailto:[EMAIL PROTECTED]
>http://www.surbl.org/
>
>

And adding a URI rule for the completewhois list (basically the same
function as the no longer existing ipwhois.rfc-ignorant.org list) will hit
yet more name servers and spammer IPs with slightly fewer FPs (no issue with
escalations).  The list is: combined-HIB.dnsiplists.completewhois.com

Paul Shupak
[EMAIL PROTECTED]

P.S.  And if you can afford many more FPs, you can use SPEWS L1 with a low
score (catches far more than the other two combined, but has serious issues
with "escalations" and "innocent bystanders").

RE: Is Bayes Really Necessary?

2005-06-03 Thread Chris Santerre



>-Original Message-
>From: Loren Wilton [mailto:[EMAIL PROTECTED]
>Sent: Friday, June 03, 2005 6:47 AM
>To: Duncan Hill; users@spamassassin.apache.org
>Subject: Re: Is Bayes Really Necessary?
>
>
>>> If that statement is true, perhaps the surbl lists could 
>automatically
>>> include the dotquads for hosts that are known to be 
>pure spam sources and
>>> not mixed systems.  Then the client could get the ip for a 
>suspect hostname
>>> and see if it matched a known spam dotquad.
>
>> I'd swear this came up before.  The one (slight?) problem 
>with this tactic is 
>> that you can have too many FPs if a spammer targets a legit hosting 
>> operation.
>
>I think there was a failure to read all the words in my 
>original post.  
>
>I quite specifically suggested that listing ips should be 
>limited to hosts that are known to be pure spam 
>sources.  If the host is KNOWN to be purely spam 
>(ie: it is owned and run by the spammer), I fail completely to 
>see how matching on the known IP for that host can either 
>target or hit innocent bystanders; or indeed bystanders of any sort.
>
>It might be argued that making the determination that a host 
>is a pure spam host could be hard.  This may well be true.  
>But despite that, I'd bet that Jeff or Chris could probably 
>list off a dozen or hundred or so hosts that they know quite 
>well serve nothing except spammer domains.  I fail completely 
>to see how matching on the ip for these known hosts can do 
>anything but good, assuming the ip lookup is limited to the 
>resolved ips of urls found in the spam.
>
>Loren

Loren is correct. And Jeff and I have had this conversation many times. Jeff
would rather not risk the FPs by doing it. I can see his point. But I agree
with Loren that we have IPs that are pure spam. 

But we watch those on the backend like Loren said. Getting more automated as
well. So rather then do the extra processing up front, our research just
pays more attention to those 'pure evil' hosts. Which is one of the reasons
the domains fall into balck.uribl.com so fast. 

I won't release the list of IPs I have now. Not yet anyway. Don't want them
to move :)

Chris Santerre 
System Admin and SARE/URIBL Ninja
http://www.rulesemporium.com 
http://www.uribl.com

Re: Is Bayes Really Necessary?

2005-06-03 Thread Loren Wilton

>> If that statement is true, perhaps the surbl lists could automatically
>> include the dotquads for hosts that are known to be pure spam 
>> sources and
>> not mixed systems.  Then the client could get the ip for a suspect hostname
>> and see if it matched a known spam dotquad.

> I'd swear this came up before.  The one (slight?) problem with this tactic is 
> that you can have too many FPs if a spammer targets a legit hosting 
> operation.

I think there was a failure to read all the words in my original post.  

I quite specifically suggested that listing ips should be limited to hosts 
that are known to be pure spam sources.  If the host is KNOWN 
to be purely spam (ie: it is owned and run by the spammer), I fail completely 
to see how matching on the known IP for that host can either target or hit 
innocent bystanders; or indeed bystanders of any sort.

It might be argued that making the determination that a host is a pure spam 
host could be hard.  This may well be true.  But despite that, I'd bet that 
Jeff or Chris could probably list off a dozen or hundred or so hosts that they 
know quite well serve nothing except spammer domains.  I fail completely to see 
how matching on the ip for these known hosts can do anything but good, assuming 
the ip lookup is limited to the resolved ips of urls found in the spam.

Loren

Re: Is Bayes Really Necessary?

2005-06-03 Thread Jeff Chan

On Friday, June 3, 2005, 12:33:26 AM, Duncan Hill wrote:
> On Friday 03 June 2005 08:10, Loren Wilton typed:
>> It was basically "the spammer makes a zillion new domains, and they all
>> take time to get into SURBL, so some spam gets through.  But they all point
>> to the same dotted quad, and I can match on that lookup".
>>
>> If that statement is true, perhaps the surbl lists could automatically
>> include the dotquads for hosts that are known to be pure spam sources and
>> not mixed systems.  Then the client could get the ip for a suspect hostname
>> and see if it matched a known spam dotquad.

> I'd swear this came up before.  The one (slight?) problem with this tactic is 
> that you can have too many FPs if a spammer targets a legit hosting 
> operation.

Exactly.  Listing resolved IPs magnifies the problems with false
positives, joe jobs and collateral damage.  Please see:

  http://www.surbl.org/faq.html#numbered

"Are there plans to offer an RBL list with the domain names
resolved into IP addresses?"

> Postifx does have a neat restriction to reject based on the IP address of the 
> name server.  You run the same risk, but I've noticed that the pr1ces, al1v3 
> and so on spammer has used the same NS servers for each one

Using sbl.spamhaus.org with uridnsbl in SA3 does something
similar.  SBL has many spammer nameservers listed in it and
uridnsbl checks a URI's nameservers against SBL.  It tends
to detect many spamy domains that way (and occasionally a few
relatively innocent bystanders).

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/

Re: Is Bayes Really Necessary?

2005-06-03 Thread Duncan Hill

On Friday 03 June 2005 08:10, Loren Wilton typed:
> It was basically "the spammer makes a zillion new domains, and they all
> take time to get into SURBL, so some spam gets through.  But they all point
> to the same dotted quad, and I can match on that lookup".
>
> If that statement is true, perhaps the surbl lists could automatically
> include the dotquads for hosts that are known to be pure spam sources and
> not mixed systems.  Then the client could get the ip for a suspect hostname
> and see if it matched a known spam dotquad.

I'd swear this came up before.  The one (slight?) problem with this tactic is 
that you can have too many FPs if a spammer targets a legit hosting 
operation.

Postifx does have a neat restriction to reject based on the IP address of the 
name server.  You run the same risk, but I've noticed that the pr1ces, al1v3 
and so on spammer has used the same NS servers for each one

Re: Is Bayes Really Necessary?

2005-06-03 Thread Loren Wilton

> SURBLs on the other hand have mostly domain names with a few IPs.
> Whatever appears in URI host portions is what goes into SURBLs.
> Usually URIs have domain names so that's what most of the SURBL
> records are.

Jeff, the OP (or someone) had an interesting idea, I thought.

It was basically "the spammer makes a zillion new domains, and they all take
time to get into SURBL, so some spam gets through.  But they all point to
the same dotted quad, and I can match on that lookup".

If that statement is true, perhaps the surbl lists could automatically
include the dotquads for hosts that are known to be pure spam sources and
not mixed systems.  Then the client could get the ip for a suspect hostname
and see if it matched a known spam dotquad.

Possibly this would want to be a separate list.

Alternately, it might want to be possible 'backend processing' inside surbl
itself.  For instance, you could run your own caching dns.  Any hostname
lookup request not matching the current list (or the whitelist) gets looked
up.  If the ip address matches that of a known spam host, it is
automatically added to the list and a positive hit is returned to the
original requestor.  Instant catching of unknown spam domains!

Of course with your policies you may simply want to add the domain name to a
list for manual review rather than directly including it.  Or perhaps
establish a new list that is scored deliberately at half the normal surbl
score and add it to that list and flag for manual review.  If it is spam, it
will provide at least some early warning to people receiving it.  If it
turns out to be a false hit, it will be found in manual review and removed
from the list shortly, and in the mean time the low score means no great
harm will likely be done.

I think this is a concept worth thinking about.  Domain names are near
infinite, but there is a limit on IPV4 ip addresses; so a lot of domain
names must end up mapping to the same ip address in some way or other.  This
is something that we should be able to exploit.

Loren

Re: Is Bayes Really Necessary?

2005-06-02 Thread Jeff Chan

On Thursday, May 26, 2005, 12:49:05 PM, Evan Langlois wrote:
> On Thu, 2005-05-26 at 10:42 -0400, Chris Santerre wrote:

>> For site wide, I'm pretty much against it. I know people will argue that
>> point. I'm obviously biased towards SARE rules updated with RDJ. And the use
>> of URIBL.com lists. But these allow a general users, or a sitewide install
>> to "set and forget". Which is what we strive for, so SA can be more widley
>> excepted. 
>> 
>> I have a 99% filter rate without bayes. And I'm proud of that. 

> I've been testing URIBL and SURBL against just reversing the hostnames
> and looking it up on SBL-XBL,

SBL and XBL have numeric IP addresses, so they shouldn't match
host names.

SURBLs on the other hand have mostly domain names with a few IPs.
Whatever appears in URI host portions is what goes into SURBLs.
Usually URIs have domain names so that's what most of the SURBL
records are.

Cheers,

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/

RE: Is Bayes Really Necessary?

2005-05-27 Thread Chris Santerre



>-Original Message-
>From: Jake Colman [mailto:[EMAIL PROTECTED]
>Sent: Friday, May 27, 2005 9:47 AM
>To: users@spamassassin.apache.org
>Subject: Re: Is Bayes Really Necessary?
>
>
>
>OK.  I misunderstood.  The URIBLS are working fine.  
>Interestingly, although
>I use the SARE rules and URIBLS, some spam is still slipping 
>through.  This
>spam is fairly obvious spam some I am a bit surprised.  Should 
>I be tweaking
>the scoring?
>


Need an example with header info.

--Chris

Re: Is Bayes Really Necessary?

2005-05-27 Thread Jake Colman

OK.  I misunderstood.  The URIBLS are working fine.  Interestingly, although
I use the SARE rules and URIBLS, some spam is still slipping through.  This
spam is fairly obvious spam some I am a bit surprised.  Should I be tweaking
the scoring?

> "MK" == Matt Kettler <[EMAIL PROTECTED]> writes:

   MK> Jake Colman wrote:
   >>> "CS" == Chris Santerre <[EMAIL PROTECTED]> writes:
   >> 
   CS> If you are using SA 3.x, support is already included. You simply have
   CS> to create the config file, restart spamd, and *poof* way less spam.
   >> 
   CS> Net::Dns is required. I forget which version. I forget a lot of
   CS> stuff. What was the question?
   >> 
   >> Chris,
   >> 
   >> Now I'm confused.  The usage page on the site says to create a simple .cf
   >> file containing a number of lines.  Is that it?  If I have that .cf file 
in
   >> my /etc/mail/spamassassin directory it will all simply work? 
   >> ...Jake
   >> 

   MK> Jake, that "simple cf file" *should* already included by default with SA 
3.0.x.
   MK> You really shouldn't have to create a config file, or do anything at all 
to get
   MK> URIBL's going.

   MK> http://www.surbl.org/  mentions suggestions about adding rules, but most 
of the
   MK> surbl lists are already built into SA 3.0. The only one that's missing 
is the JP
   MK> list, which came on-line to late to make it into the 3.0 release. Add it 
if you
   MK> want, but do so AFTER you get the built-in ones going.

   MK> If the URIBLs aren't going, check these two things:

   MK> 1) check to make sure you have /etc/mail/spamassassin/init.pre. Some
   MK> distribution packages left this file out when they converted the tarball 
(oops)
   MK> Without the init.pre, the plugin for URIBL's doesn't get loaded.

   MK> It should have this statement in it to support URIBLs:

   MK> loadplugin Mail::SpamAssassin::Plugin::URIDNSBL

   >> Yes, I have Net::DNS since I am already doing all the other net checks.
   >> 

   MK> 2) Just because your copy of Net::DNS works for RBLs does not mean it 
will work
   MK> for the URIBLs. You need a higher version of Net::DNS to support URIBLs 
than you
   MK> need for normal net checks.

   MK> Check spamassassin --lint -D to see if it's complaining about the 
version of
   MK> Net::DNS.

-- 
Jake Colman
Sr. Applications Developer
Principia Partners LLC
Harborside Financial Center
1001 Plaza Two
Jersey City, NJ 07311
(201) 209-2467
www.principiapartners.com

Re: Is Bayes Really Necessary?

2005-05-27 Thread jdow

From: "David B Funk" <[EMAIL PROTECTED]>

> As spammers are constantly mutating and adapting, having a dynamic,
> adaptive component of SA is a must to avoid the "saw-tooth" effect.
> (a fresh SA install works great, gradually loses effectiveness until a
> new update install, and so on).

Um, yeah, you make an fresh install with no SARE rules and its REALLY
bad. It saw tooths upwards as you break down and install more SARE rules.
Then a periodic update keeps you up there quite nicely.

Seriously, I was AMAZED at how bad a raw 3.02 install was here until I
put in the SARE rules, even after I got the Bayes trained. (Did that
right away off my saved ham and spam database.)

{^_-}

Re: Is Bayes Really Necessary?

2005-05-27 Thread jdow

From: "Jim Maul" <[EMAIL PROTECTED]>

> Gotta stop smokin the green ;)

Yeah, it's better if you shovel the random greens you find into the
compost pit. Not many people will look for them in a compost pit when
they get reported as missing persons.

{O,o}

Re: Is Bayes Really Necessary?

2005-05-27 Thread jdow

From: "List Mail User" <[EMAIL PROTECTED]>

> Though nobody seems to have said it exactly this way:  It seems
> to be becoming very obvious that the people who say the have problems
> with Bayes are those who support a diverse group of users (e.g. ISPs
> and email providers) and those who find it works well, even with
autolearning
> are those with either small numbers of users or users who are mostly of
> a very specific categorization type (e.g. medical, legal, technical, or
> just about any homogenous group).

I suspect you are right, Paul. And I restrict the group a little farther
to suggest it is large ISPs with diverse customer bases and global Bayes
who have the most trouble. Per user Bayes, a good set of SARE rules, and
significantly widened autolearn thresholds from base install levels may
be their solution.

Global Bayes is probably the ISP poison proposition. And autolearn with
normal thresholds is probably further poison.

But then, I run manual learn, private Bayes, and LOTS of rules. (40 sets
of SARE rules plus my own largish set of rules that apply to me but not
others works nicely along with the private Bayes)

{^_-}

Re: Is Bayes Really Necessary?

2005-05-27 Thread jdow

From: "Matt Kettler" <[EMAIL PROTECTED]>
(Sneaky one you are - you got around my Reply-To markup for this list. For
that you get an extra copy. {^_-})

> jdow wrote:
> > One way to keep Bayes from running is to never train it.
> > {^_^}
>
> You'd also disable autolearning. By default SA will eventually autolearn
enough
> email to being using bayes. (and often these pure auto-learn only DBs end
up
> with very bad results.)

I said what you could do. I left how as an exercise for the student.

I figure if he tries without Bayes for awhile (kill all training and
move the bayes database into a corner somewhere that SA cannot find)
he may find his one true answer for his question.

{^_-}   <- Self has determined for her situation Bayes is necessary.

Re: Is Bayes Really Necessary?

2005-05-26 Thread Matt Kettler

Jake Colman wrote:
>>"CS" == Chris Santerre <[EMAIL PROTECTED]> writes:
> 
>CS> If you are using SA 3.x, support is already included. You simply have
>CS> to create the config file, restart spamd, and *poof* way less spam.
> 
>CS> Net::Dns is required. I forget which version. I forget a lot of
>CS> stuff. What was the question?
> 
> Chris,
> 
> Now I'm confused.  The usage page on the site says to create a simple .cf
> file containing a number of lines.  Is that it?  If I have that .cf file in
> my /etc/mail/spamassassin directory it will all simply work? 
> ...Jake
> 

Jake, that "simple cf file" *should* already included by default with SA 3.0.x.
You really shouldn't have to create a config file, or do anything at all to get
URIBL's going.

http://www.surbl.org/  mentions suggestions about adding rules, but most of the
surbl lists are already built into SA 3.0. The only one that's missing is the JP
list, which came on-line to late to make it into the 3.0 release. Add it if you
want, but do so AFTER you get the built-in ones going.

If the URIBLs aren't going, check these two things:

1) check to make sure you have /etc/mail/spamassassin/init.pre. Some
distribution packages left this file out when they converted the tarball (oops)
Without the init.pre, the plugin for URIBL's doesn't get loaded.

It should have this statement in it to support URIBLs:

loadplugin Mail::SpamAssassin::Plugin::URIDNSBL

>  Yes, I have Net::DNS since I am already doing all the other net checks.
> 

2) Just because your copy of Net::DNS works for RBLs does not mean it will work
for the URIBLs. You need a higher version of Net::DNS to support URIBLs than you
need for normal net checks.

Check spamassassin --lint -D to see if it's complaining about the version of
Net::DNS.

Re: Is Bayes Really Necessary?

2005-05-26 Thread Jake Colman

> "CS" == Chris Santerre <[EMAIL PROTECTED]> writes:

   >> I already use RDJ and the automatic updater.  How do I use URIBL?  I
   >> looked at the usage page and I undersyand that I need to create a .cf
   >> file but how does it access the lists?

   CS> If you are using SA 3.x, support is already included. You simply have
   CS> to create the config file, restart spamd, and *poof* way less spam.

   CS> Net::Dns is required. I forget which version. I forget a lot of
   CS> stuff. What was the question?

Chris,

Now I'm confused.  The usage page on the site says to create a simple .cf
file containing a number of lines.  Is that it?  If I have that .cf file in
my /etc/mail/spamassassin directory it will all simply work?  Yes, I have
Net::DNS since I am already doing all the other net checks.

...Jake

-- 
Jake Colman
Sr. Applications Developer
Principia Partners LLC
Harborside Financial Center
1001 Plaza Two
Jersey City, NJ 07311
(201) 209-2467
www.principiapartners.com

Re: Is Bayes Really Necessary?

2005-05-26 Thread David B Funk

On Thu, 26 May 2005, Thomas Cameron wrote:

> On Thu, 2005-05-26 at 10:08 -0400, Jake Colman wrote:
> > Given the rather complete set of rules that ship with SA and which can
> > expanded with SARE, does bayes learning really help?  Won't the rules catch
> > pretty much everything anyway?
>
> I have used SA with Bayes and it took quite a bit of administrative
> overhead.  It worked amazingly well, though.
>
> I now run SA with DCC, Razor, Pyzor and network checks and without Bayes
> and it still Just Works(TM).  Seriously - I have customers who slather

You could make the argument that Razor, Pyzor, etc perform a similar
function to Bayes (analyze a message, generate some kind of 'collapsed'
representation, compare it with a database of known messages
and come up with a "spammyness" value).

As spammers are constantly mutating and adapting, having a dynamic,
adaptive component of SA is a must to avoid the "saw-tooth" effect.
(a fresh SA install works great, gradually loses effectiveness until a
new update install, and so on).

Bayes has the advantage that it's local, no network overhead, can be
trained to 'know' your specific kinds of messages.

Bayes has the disadvantage that it's your local responsibility to
see that it's trained properly.

-- 
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

RE: Is Bayes Really Necessary?

2005-05-26 Thread Evan Langlois

On Thu, 2005-05-26 at 10:42 -0400, Chris Santerre wrote:

> For site wide, I'm pretty much against it. I know people will argue that
> point. I'm obviously biased towards SARE rules updated with RDJ. And the use
> of URIBL.com lists. But these allow a general users, or a sitewide install
> to "set and forget". Which is what we strive for, so SA can be more widley
> excepted. 
> 
> I have a 99% filter rate without bayes. And I'm proud of that. 

I've been testing URIBL and SURBL against just reversing the hostnames
and looking it up on SBL-XBL, and I can say that URIBL and SURBL don't
catch nearly the number of spams.  I get close to a 99% filter rate just
checking the links alone.

Re: Is Bayes Really Necessary?

2005-05-26 Thread Jim Maul

Chris Santerre wrote:

-Original Message-
From: Jake Colman [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 26, 2005 2:54 PM
To: users@spamassassin.apache.org
Subject: Re: Is Bayes Really Necessary?

"CS" == Chris Santerre <[EMAIL PROTECTED]> writes:

 >> -Original Message-
 >> From: Jake Colman [mailto:[EMAIL PROTECTED]
 >> Sent: Thursday, May 26, 2005 10:09 AM
 >> To: users@spamassassin.apache.org
 >> Subject: Is Bayes Really Necessary?
 >> 
 >> 
 >> 
 >> Given the rather complete set of rules that ship with SA 
and which can
 >> expanded with SARE, does bayes learning really help?  Won't 
 >> the rules catch

 >> pretty much everything anyway?

 CS> Oh my favorite subject!!! :) 

 CS> NO! Bayes is not necessary. IMHO, for personal use, it 
is incredible. But I
 CS> feel the care of it is more difficult then your average 
user would care to
 CS> keep up. 

 CS> For site wide, I'm pretty much against it. I know 
people will argue that
 CS> point. I'm obviously biased towards SARE rules updated 
with RDJ. And the use
 CS> of URIBL.com lists. But these allow a general users, or 
a sitewide install
 CS> to "set and forget". Which is what we strive for, so SA 
can be more widley
 CS> excepted. 

 CS> I have a 99% filter rate without bayes. And I'm proud of that. 

 CS> Chris Santerre 
 CS> System Admin and SARE/URIBL Ninja
 CS> http://www.rulesemporium.com 
 CS> http://www.uribl.com

I already use RDJ and the automatic updater.  How do I use 
URIBL?  I looked
at the usage page and I undersyand that I need to create a .cf 
file but how

does it access the lists?

If you are using SA 3.x, support is already included. You simply have to
create the config file, restart spamd, and *poof* way less spam. 

Net::Dns is required. I forget which version. I forget a lot of stuff. What
was the question?

--Chris 

Gotta stop smokin the green ;)

-Jim

RE: Is Bayes Really Necessary?

2005-05-26 Thread Chris Santerre



>-Original Message-
>From: Jake Colman [mailto:[EMAIL PROTECTED]
>Sent: Thursday, May 26, 2005 2:54 PM
>To: users@spamassassin.apache.org
>Subject: Re: Is Bayes Really Necessary?
>
>
>>>>>> "CS" == Chris Santerre <[EMAIL PROTECTED]> writes:
>
>   >> -Original Message-
>   >> From: Jake Colman [mailto:[EMAIL PROTECTED]
>   >> Sent: Thursday, May 26, 2005 10:09 AM
>   >> To: users@spamassassin.apache.org
>   >> Subject: Is Bayes Really Necessary?
>   >> 
>   >> 
>   >> 
>   >> Given the rather complete set of rules that ship with SA 
>and which can
>   >> expanded with SARE, does bayes learning really help?  Won't 
>   >> the rules catch
>   >> pretty much everything anyway?
>
>   CS> Oh my favorite subject!!! :) 
>
>   CS> NO! Bayes is not necessary. IMHO, for personal use, it 
>is incredible. But I
>   CS> feel the care of it is more difficult then your average 
>user would care to
>   CS> keep up. 
>
>   CS> For site wide, I'm pretty much against it. I know 
>people will argue that
>   CS> point. I'm obviously biased towards SARE rules updated 
>with RDJ. And the use
>   CS> of URIBL.com lists. But these allow a general users, or 
>a sitewide install
>   CS> to "set and forget". Which is what we strive for, so SA 
>can be more widley
>   CS> excepted. 
>
>   CS> I have a 99% filter rate without bayes. And I'm proud of that. 
>
>   CS> Chris Santerre 
>   CS> System Admin and SARE/URIBL Ninja
>   CS> http://www.rulesemporium.com 
>   CS> http://www.uribl.com
>
>I already use RDJ and the automatic updater.  How do I use 
>URIBL?  I looked
>at the usage page and I undersyand that I need to create a .cf 
>file but how
>does it access the lists?

If you are using SA 3.x, support is already included. You simply have to
create the config file, restart spamd, and *poof* way less spam. 

Net::Dns is required. I forget which version. I forget a lot of stuff. What
was the question?

--Chris

Re: Is Bayes Really Necessary?

2005-05-26 Thread Jake Colman

> "CS" == Chris Santerre <[EMAIL PROTECTED]> writes:

   >> -Original Message-
   >> From: Jake Colman [mailto:[EMAIL PROTECTED]
   >> Sent: Thursday, May 26, 2005 10:09 AM
   >> To: users@spamassassin.apache.org
   >> Subject: Is Bayes Really Necessary?
   >> 
   >> 
   >> 
   >> Given the rather complete set of rules that ship with SA and which can
   >> expanded with SARE, does bayes learning really help?  Won't 
   >> the rules catch
   >> pretty much everything anyway?

   CS> Oh my favorite subject!!! :) 

   CS> NO! Bayes is not necessary. IMHO, for personal use, it is incredible. 
But I
   CS> feel the care of it is more difficult then your average user would care 
to
   CS> keep up. 

   CS> For site wide, I'm pretty much against it. I know people will argue that
   CS> point. I'm obviously biased towards SARE rules updated with RDJ. And the 
use
   CS> of URIBL.com lists. But these allow a general users, or a sitewide 
install
   CS> to "set and forget". Which is what we strive for, so SA can be more 
widley
   CS> excepted. 

   CS> I have a 99% filter rate without bayes. And I'm proud of that. 

   CS> Chris Santerre 
   CS> System Admin and SARE/URIBL Ninja
   CS> http://www.rulesemporium.com 
   CS> http://www.uribl.com

I already use RDJ and the automatic updater.  How do I use URIBL?  I looked
at the usage page and I undersyand that I need to create a .cf file but how
does it access the lists?

-- 
Jake Colman
Sr. Applications Developer
Principia Partners LLC
Harborside Financial Center
1001 Plaza Two
Jersey City, NJ 07311
(201) 209-2467
www.principiapartners.com

Re: Is Bayes Really Necessary?

2005-05-26 Thread Dimitri Yioulos

On Thursday May 26 2005 1:13 pm, Loren Wilton wrote:
> > Given the rather complete set of rules that ship with SA and which can
> > expanded with SARE, does bayes learning really help?  Won't the rules
>
> catch
>
> > pretty much everything anyway?
>
> Um, maybe, maybe not.
>
> Bayes *necessary*?  No, especially if you run net tests.
> Bayes *highly desirable*?  Yup.  An additional 4 points can really help
> when a new spam shows up that you don't have a lot of rules for.
>
> Loren

Loren's point well taken.  I think it's the use of bayes in conjunction with 
other rules that tends to work best. At least, that's my experience.

Dimitri

Re: Is Bayes Really Necessary?

2005-05-26 Thread Loren Wilton

> Given the rather complete set of rules that ship with SA and which can
> expanded with SARE, does bayes learning really help?  Won't the rules
catch
> pretty much everything anyway?

Um, maybe, maybe not.

Bayes *necessary*?  No, especially if you run net tests.
Bayes *highly desirable*?  Yup.  An additional 4 points can really help when
a new spam shows up that you don't have a lot of rules for.

Loren

Re: Is Bayes Really Necessary?

2005-05-26 Thread List Mail User

Though nobody seems to have said it exactly this way:  It seems
to be becoming very obvious that the people who say the have problems
with Bayes are those who support a diverse group of users (e.g. ISPs
and email providers) and those who find it works well, even with autolearning
are those with either small numbers of users or users who are mostly of
a very specific categorization type (e.g. medical, legal, technical, or
just about any homogenous group).

Despite the oft repeated cleam spammers are dumb, not all are;  And
the "Bayes poison" we all see added to spam must work for some group, and
I would guess that it is exactly those users who have the diverse user bases
and have primarily "personal conversational" content in lots of the email
running through their systems.

For me, the few times I see Bayes give apparent wrong answers is
in email from friends and family, and never from clients or technical contacts.
(and it is certainly worse that many members of my family have spent their
entire careers in marketing - they often get Bayes_80 corse when writing me).
This lends support to the notion that the added text does indeed match some
types of common communication.

If my supposition is correct, the question then becomes:  Can using
personal (i.e. per user) Bayes overcome the problems which some users/sites
see?  I'm not sure how to test this - certainly I couldn't myself, but maybe
some of the other members of this list are able to and could try.  Even if it
does work, the resource load may be too high to be reasonable for many large
sites.


Paul Shupak
[EMAIL PROTECTED]

Re: Is Bayes Really Necessary?

2005-05-26 Thread Eric A. Hall

On 5/26/2005 10:08 AM, Jake Colman wrote:
> Given the rather complete set of rules that ship with SA and which can
> expanded with SARE, does bayes learning really help?  Won't the rules catch
> pretty much everything anyway?

The base SA install is insufficient, but if you tweak the scores and add
some additional tests, you can get by without bayes just fine. I use a
select set of RBLs, Razor, rulesets from rulesemporium, and my own
LDAP-based weighting plugin, and my highest spam only gets an average of
one spam per day, and even those are over the 5.0 threshold (so they are
auto-filed into the Junk Email folder).

Bayes is great for per-user stuff, but unless you are willing to manage
the per-user databases (which I'm not), it is easier to just tweak the
system scores and rules. Less management overhead, less CPU, etc.

-- 
Eric A. Hallhttp://www.ehsco.com/
Internet Core Protocols  http://www.oreilly.com/catalog/coreprot/

Re: Is Bayes Really Necessary?

2005-05-26 Thread Jim Maul


Matt Kettler wrote:

jdow wrote:


One way to keep Bayes from running is to never train it.
{^_^}



You'd also disable autolearning. By default SA will eventually autolearn enough
email to being using bayes. (and often these pure auto-learn only DBs end up
with very bad results.)




Often is the keyword here.  I guess im the exception to that norm ;) 
But then again, i altered my autolearn thresholds to -0.1 ham/12.0 spam. 
 I believe this is key to correctly use autolearning. (i dont mean 
these numbers specifically, just the concept).


-Jim

Re: Is Bayes Really Necessary?

2005-05-26 Thread Matt Kettler

jdow wrote:
> One way to keep Bayes from running is to never train it.
> {^_^}

You'd also disable autolearning. By default SA will eventually autolearn enough
email to being using bayes. (and often these pure auto-learn only DBs end up
with very bad results.)

Re: Is Bayes Really Necessary?

2005-05-26 Thread jdow

One way to keep Bayes from running is to never train it.
{^_^}
- Original Message - 
From: "Kristopher Austin" <[EMAIL PROTECTED]>


We have found Bayes to be more trouble than it's worth.  We were
frequently running into problems keeping the database stable and fresh.
We have a site-wide install so that just made it all the more
problematic.

It definitely depends on your situation.  I don't think anyone can make
a blanket statement one way or the other.

We have had great success without Bayes and the amount of admin time
necessary to keep SA running has dropped significantly.

Kris

-Original Message-
From: Jake Colman [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 26, 2005 9:09 AM
To: users@spamassassin.apache.org
Subject: Is Bayes Really Necessary?


Given the rather complete set of rules that ship with SA and which can
expanded with SARE, does bayes learning really help?  Won't the rules
catch
pretty much everything anyway?

-- 
Jake Colman

Re: Is Bayes Really Necessary?

2005-05-26 Thread Ralf Hildebrandt

* Jim Maul <[EMAIL PROTECTED]>:

> I have been running sitewide bayes since the beginning without much 
> maintenance at all.  It has autolearned every message itself and its 
> dead on balls accurate.  I've trained maybe 20 message total manually so 
> i dont see how running bayes could actually cause more work for an admin 
> unless its been trained poorly and they have to correct it.

I also train it manually with all the spam that slips through (and some
ham as well, to keep the balance).

-- 
Ralf Hildebrandt (i.A. des IT-Zentrums) [EMAIL PROTECTED]
Charite - Universitätsmedizin BerlinTel.  +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-BerlinFax.  +49 (0)30-450 570-962
IT-Zentrum Standort CBF send no mail to [EMAIL PROTECTED]

Re: Is Bayes Really Necessary?

2005-05-26 Thread Jim Maul


Ralf Hildebrandt wrote:

* Kristopher Austin <[EMAIL PROTECTED]>:


We have found Bayes to be more trouble than it's worth.  We were
frequently running into problems keeping the database stable and fresh.
We have a site-wide install so that just made it all the more
problematic.



We also have a site-wide install with Bayes (15.000 Users). Where is
the problem with "keeping the database stable and fresh"? Never
crashed here.


I have been running sitewide bayes since the beginning without much 
maintenance at all.  It has autolearned every message itself and its 
dead on balls accurate.  I've trained maybe 20 message total manually so 
i dont see how running bayes could actually cause more work for an admin 
unless its been trained poorly and they have to correct it.  Even then 
its probably just easier to delete it and start over.


I tag spam at 5.0 and have bayes BAYES_99 at 5.4.  This one rule alone 
is enough to mark spam and i havent had any false positives because of 
it yet.


-Jim

Re: Is Bayes Really Necessary?

2005-05-26 Thread Keith Ivey


Joe Zitnik wrote:


Bayes definitely helps, but auto-learn can cause problems.  Perhaps a
better question would be, "Is autolearn really neccessary?"


I think the problems mostly come from accidentally autolearning spam as 
ham, which is easy with the default threshold.  Autolearning messages as 
spam at a reasonable threshold should be okay.


--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC

Re: Is Bayes Really Necessary?

2005-05-26 Thread Joe Zitnik


I have autolearn off.  I have been burned by it twice.>>> <[EMAIL PROTECTED]> 5/26/2005 10:33 AM >>>
On Thu, 26 May 2005, Joe Zitnik wrote:> I think points can be made for both sides of the argument.  The thing> that makes bayes different, is that a well trained bayes database is> specific to your environment.  If you're a law firm, your learned ham is> going to be heavy in legalese, medical related org, heavy in that> terminology.  Because spam and ham is learned specific to your> environment, it can make a big difference.>> >>> Jake Colman <[EMAIL PROTECTED]> 5/26/2005 10:08 AM > Given the rather complete set of rules that ship with SA and which can> expanded with SARE, does bayes learning really help?  Won't the rules> catch> pretty much everything anyway?Bayes definitely helps, but auto-learn can cause problems.  Perhaps abetter question would be, "Is autolearn really neccessary?"James Smallacombe          PlantageNet, Inc. CEO and Janitor[EMAIL PROTECTED]                                http://3.am=

Re: Is Bayes Really Necessary?

2005-05-26 Thread up

On Thu, 26 May 2005, Joe Zitnik wrote:

> I think points can be made for both sides of the argument.  The thing
> that makes bayes different, is that a well trained bayes database is
> specific to your environment.  If you're a law firm, your learned ham is
> going to be heavy in legalese, medical related org, heavy in that
> terminology.  Because spam and ham is learned specific to your
> environment, it can make a big difference.
>
> >>> Jake Colman <[EMAIL PROTECTED]> 5/26/2005 10:08 AM >>>
>
> Given the rather complete set of rules that ship with SA and which can
> expanded with SARE, does bayes learning really help?  Won't the rules
> catch
> pretty much everything anyway?

Bayes definitely helps, but auto-learn can cause problems.  Perhaps a
better question would be, "Is autolearn really neccessary?"

James Smallacombe PlantageNet, Inc. CEO and Janitor
[EMAIL PROTECTED]   
http://3.am
=

RE: Is Bayes Really Necessary?

2005-05-26 Thread Chris Santerre



>-Original Message-
>From: Jake Colman [mailto:[EMAIL PROTECTED]
>Sent: Thursday, May 26, 2005 10:09 AM
>To: users@spamassassin.apache.org
>Subject: Is Bayes Really Necessary?
>
>
>
>Given the rather complete set of rules that ship with SA and which can
>expanded with SARE, does bayes learning really help?  Won't 
>the rules catch
>pretty much everything anyway?

Oh my favorite subject!!! :) 

NO! Bayes is not necessary. IMHO, for personal use, it is incredible. But I
feel the care of it is more difficult then your average user would care to
keep up. 

For site wide, I'm pretty much against it. I know people will argue that
point. I'm obviously biased towards SARE rules updated with RDJ. And the use
of URIBL.com lists. But these allow a general users, or a sitewide install
to "set and forget". Which is what we strive for, so SA can be more widley
excepted. 

I have a 99% filter rate without bayes. And I'm proud of that. 

Chris Santerre 
System Admin and SARE/URIBL Ninja
http://www.rulesemporium.com 
http://www.uribl.com

Re: Is Bayes Really Necessary?

2005-05-26 Thread Joe Zitnik


I think points can be made for both sides of the argument.  The thing that makes bayes different, is that a well trained bayes database is specific to your environment.  If you're a law firm, your learned ham is going to be heavy in legalese, medical related org, heavy in that terminology.  Because spam and ham is learned specific to your environment, it can make a big difference.>>> Jake Colman <[EMAIL PROTECTED]> 5/26/2005 10:08 AM >>>
Given the rather complete set of rules that ship with SA and which canexpanded with SARE, does bayes learning really help?  Won't the rules catchpretty much everything anyway?-- Jake ColmanSr. Applications DeveloperPrincipia Partners LLCHarborside Financial Center1001 Plaza TwoJersey City, NJ 07311(201) 209-2467www.principiapartners.com

Re: Is Bayes Really Necessary?

2005-05-26 Thread Ralf Hildebrandt

* Kristopher Austin <[EMAIL PROTECTED]>:
> We have found Bayes to be more trouble than it's worth.  We were
> frequently running into problems keeping the database stable and fresh.
> We have a site-wide install so that just made it all the more
> problematic.

We also have a site-wide install with Bayes (15.000 Users). Where is
the problem with "keeping the database stable and fresh"? Never
crashed here.
-- 
Ralf Hildebrandt (i.A. des IT-Zentrums) [EMAIL PROTECTED]
Charite - Universitätsmedizin BerlinTel.  +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-BerlinFax.  +49 (0)30-450 570-962
IT-Zentrum Standort CBF send no mail to [EMAIL PROTECTED]

Re: Is Bayes Really Necessary?

2005-05-26 Thread Thomas Cameron

On Thu, 2005-05-26 at 10:08 -0400, Jake Colman wrote:
> Given the rather complete set of rules that ship with SA and which can
> expanded with SARE, does bayes learning really help?  Won't the rules catch
> pretty much everything anyway?

I have used SA with Bayes and it took quite a bit of administrative
overhead.  It worked amazingly well, though.  

I now run SA with DCC, Razor, Pyzor and network checks and without Bayes
and it still Just Works(TM).  Seriously - I have customers who slather
their e-mail addresses all over Usenet, message boards, on their web
pages, etc.  They might as well put a big sign up that says SPAM ME
PLEASE!!!  

But they don't get any spam - SA and spamass-milter rejects all of it.
It is really amazing - I've got clients who went from hundreds of spams
per day down to one or two that slip through per week.  Of course, when
one gets through, my phone rings!

I guess my experience is that either way, SA Just Works(TM).

Cheers,
Thomas

RE: Is Bayes Really Necessary?

2005-05-26 Thread Kristopher Austin

We have found Bayes to be more trouble than it's worth.  We were
frequently running into problems keeping the database stable and fresh.
We have a site-wide install so that just made it all the more
problematic.

It definitely depends on your situation.  I don't think anyone can make
a blanket statement one way or the other.

We have had great success without Bayes and the amount of admin time
necessary to keep SA running has dropped significantly.

Kris

-Original Message-
From: Jake Colman [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 26, 2005 9:09 AM
To: users@spamassassin.apache.org
Subject: Is Bayes Really Necessary?


Given the rather complete set of rules that ship with SA and which can
expanded with SARE, does bayes learning really help?  Won't the rules
catch
pretty much everything anyway?

-- 
Jake Colman
Sr. Applications Developer
Principia Partners LLC
Harborside Financial Center
1001 Plaza Two
Jersey City, NJ 07311
(201) 209-2467
www.principiapartners.com

RE: Is Bayes Really Necessary?

2005-05-26 Thread Steven Manross

Yes, BAYES is an integral part of SA!

It's like a constantly changing rule (without the need to tweak the rule
ever so slightly for nuances in the "new" mail.

There are mails that don't trip any standard rules, but are caught by
bayes alone.

Steven

-Original Message-
From: Jake Colman [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 26, 2005 7:09 AM
To: users@spamassassin.apache.org
Subject: Is Bayes Really Necessary?



Given the rather complete set of rules that ship with SA and which can
expanded with SARE, does bayes learning really help?  Won't the rules
catch
pretty much everything anyway?

-- 
Jake Colman
Sr. Applications Developer
Principia Partners LLC
Harborside Financial Center
1001 Plaza Two
Jersey City, NJ 07311
(201) 209-2467
www.principiapartners.com

47 matches

Mail list logo