RE: Is Bayes Really Necessary?
On Mon, 6 Jun 2005 [EMAIL PROTECTED] wrote: > David Brodbeck wrote: > > Loren Wilton wrote: > >> You'ld think that there should be some way to do a reverse DNS to > >> determine from an ip the domains that exist on that ip. I suspect > >> though that the whole internet fabric is designed the other way > >> around, and that this information is probably something that no > >> single registrar would know. > > > > In theory, a reverse lookup could give you all the hostnames > > associated with that IP. In reality, almost no one actually sets up > > multiple reverse DNS records for such sites. So yes, it's difficult. > > Maybe a "reverse SPF" record is called for... > > _spf.0.0.10.in-addr.arp TXT "example.org, some.example.com"... > Two-fold problem with either of those solutions: 1) It would depend upon the spammer actually registering and keeping accurate that kind of data. (Do you really think that they'll want to give the farm away ;). 2) The size of DNS answers would quickly get large enough to cause technical problems. DNS normally uses UDP packets to keep overhead low (one small packet for query, another for the response). As soon as you get more than about 500~1000 bytes of data in an answer you'll have to switch to TCP if you want to get the full data. (A lot more load on the DNS servers and more network overhead. ;( -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
RE: Is Bayes Really Necessary?
David Brodbeck wrote: > Loren Wilton wrote: >> You'ld think that there should be some way to do a reverse DNS to >> determine from an ip the domains that exist on that ip. I suspect >> though that the whole internet fabric is designed the other way >> around, and that this information is probably something that no >> single registrar would know. > > In theory, a reverse lookup could give you all the hostnames > associated with that IP. In reality, almost no one actually sets up > multiple reverse DNS records for such sites. So yes, it's difficult. Maybe a "reverse SPF" record is called for... _spf.0.0.10.in-addr.arp TXT "example.org, some.example.com"... -- Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902 Hispanic Business Inc./HireDiversity.com Software Engineer perl -e"map{y/a-z/l-za-k/;print}shift" "Jjhi pcdiwtg Ptga wprztg,"
Re: Is Bayes Really Necessary?
Loren Wilton wrote: You'ld think that there should be some way to do a reverse DNS to determine from an ip the domains that exist on that ip. I suspect though that the whole internet fabric is designed the other way around, and that this information is probably something that no single registrar would know. In theory, a reverse lookup could give you all the hostnames associated with that IP. In reality, almost no one actually sets up multiple reverse DNS records for such sites. So yes, it's difficult.
Re: Is Bayes Really Necessary?
> How exactly do we determine what other sites are hosted on a > given server, i.e., sites that don't appear in spams? IOW > how do you know there's "one internal site"? You'ld think that there should be some way to do a reverse DNS to determine from an ip the domains that exist on that ip. I suspect though that the whole internet fabric is designed the other way around, and that this information is probably something that no single registrar would know. In theory I'd think that one could process zone files to determine what existed on any given ip that was advertized and accessible by name. However, getting one's hands on the zone files in the first place... Loren
Re: Is Bayes Really Necessary?
On Saturday, June 4, 2005, 6:20:11 AM, jdow jdow wrote: > One tiny quibble. For each machine blocked there is perhaps one whole > internal site that is blocked as well. But it means that site is > throwing spam out to the universe and the company doing it or the > individual doing it should stop the practice or take back ownership > of their machine. THEY might consider themselves "innocent victims." > But it's the only way if they have one bad egg in their company or > an infected computer. Either way they really have no solid claim > on any innocence they may profess. How exactly do we determine what other sites are hosted on a given server, i.e., sites that don't appear in spams? IOW how do you know there's "one internal site"? Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
Re: Is Bayes Really Necessary?
>> >[previous stuff snipped] >> >Loren >> >> Loren is correct. And Jeff and I have had this conversation many times. >Jeff >> would rather not risk the FPs by doing it. I can see his point. But I >agree >> with Loren that we have IPs that are pure spam. > >One tiny quibble. For each machine blocked there is perhaps one whole >internal site that is blocked as well. But it means that site is >throwing spam out to the universe and the company doing it or the >individual doing it should stop the practice or take back ownership >of their machine. THEY might consider themselves "innocent victims." >But it's the only way if they have one bad egg in their company or >an infected computer. Either way they really have no solid claim >on any innocence they may profess. > >{^_^} > This is just one difference between SURBLs and some other lists (e.g. the SBL). The people *do* in some cases have a valid claim of innocence - even most of the worst spams hosting offender have at least a few legitimate customers, who did not perform adequate due-diligence before signing up/committing to using a "spammer" or "spam friendly" service; And unfortunately, for cases like this, ignorance is a valid defence (though a good lawyer would argue that for medium to large businesses, such behavior is negligent). In at least a few cases in with I have been personally involved, contacting a large company will cause them to move away and fast. For example, Ebuyer (from whom I semi-regulaly purchase equipment), contracted with a Brazilian porno outfit to host and mail on their behalf - one telephone call and within 5 minutes I was connected to a VP - 36 hours later they were elsewhere, and now operate their own site from the home office in England. (They were blacklisted by IP on the SBL, SPEWS and quite a few other lists for two days and had no idea until it was explained to them and a non-technical corporate officer was led through checking things like openrbl.org, Spamhaus, etc.) Bad business practices do not always translate to guilt; Either (IANAL) legally or IMNSHO morally; Now, if they had stayed there after being told and the situation explained, they would have lost at least one customer, who also probably would have made sure they got blacklisted in many more places:) Instead, they convinced me that the were a well meaning company, who made a mistake and acted very quickly to remedy the situation. For all those similar companies, with whom I do not do business or even recognize their names, I'm sure that many are in the same boat. Paul Shupak [EMAIL PROTECTED]
Re: Is Bayes Really Necessary?
From: "Chris Santerre" <[EMAIL PROTECTED]> > >-Original Message- > >From: Loren Wilton [mailto:[EMAIL PROTECTED] > > > >>> If that statement is true, perhaps the surbl lists could > >automatically > >>> include the dotquads for hosts that are known to be > >pure spam sources and > >>> not mixed systems. Then the client could get the ip for a > >suspect hostname > >>> and see if it matched a known spam dotquad. > > > >> I'd swear this came up before. The one (slight?) problem > >with this tactic is > >> that you can have too many FPs if a spammer targets a legit hosting > >> operation. > > > >I think there was a failure to read all the words in my > >original post. > > > >I quite specifically suggested that listing ips should be > >limited to hosts that are known to be pure spam > >sources. If the host is KNOWN to be purely spam > >(ie: it is owned and run by the spammer), I fail completely to > >see how matching on the known IP for that host can either > >target or hit innocent bystanders; or indeed bystanders of any sort. > > > >It might be argued that making the determination that a host > >is a pure spam host could be hard. This may well be true. > >But despite that, I'd bet that Jeff or Chris could probably > >list off a dozen or hundred or so hosts that they know quite > >well serve nothing except spammer domains. I fail completely > >to see how matching on the ip for these known hosts can do > >anything but good, assuming the ip lookup is limited to the > >resolved ips of urls found in the spam. > > > >Loren > > Loren is correct. And Jeff and I have had this conversation many times. Jeff > would rather not risk the FPs by doing it. I can see his point. But I agree > with Loren that we have IPs that are pure spam. One tiny quibble. For each machine blocked there is perhaps one whole internal site that is blocked as well. But it means that site is throwing spam out to the universe and the company doing it or the individual doing it should stop the practice or take back ownership of their machine. THEY might consider themselves "innocent victims." But it's the only way if they have one bad egg in their company or an infected computer. Either way they really have no solid claim on any innocence they may profess. {^_^}
Re: Is Bayes Really Necessary?
On Friday, June 3, 2005, 3:47:05 AM, Loren Wilton wrote: >>> If that statement is true, perhaps the surbl lists could automatically >>> include the dotquads for hosts that are known to be pure spam >>> sources and >>> not mixed systems. Then the client could get the ip for a suspect hostname >>> and see if it matched a known spam dotquad. >> I'd swear this came up before. The one (slight?) problem with this tactic >> is >> that you can have too many FPs if a spammer targets a legit hosting >> operation. > I think there was a failure to read all the words in my original post. > I quite specifically suggested that listing ips should be limited to hosts > that are known to be pure spam sources. If the host is KNOWN > to be purely spam (ie: it is owned and run > by the spammer), I fail completely to see how matching on the known IP for > that host can either target or hit innocent bystanders; or indeed bystanders > of any sort. > It might be argued that making the determination that a host is a pure spam > host could be hard. This may well be true. But despite that, I'd bet that > Jeff or Chris could probably list off a dozen > or hundred or so hosts that they know quite well serve nothing except spammer > domains. I fail completely to see how matching on the ip for these known > hosts can do anything but good, assuming the > ip lookup is limited to the resolved ips of urls found in the spam. > Loren It's possible to say some IPs are used in a lot of spam. Is it possible to say those IPs are only used in spam? Sure... if we were omniscient. ;-) Otherwise we don't know for certain whether there are innocent bystanders there. It's probably safer to list the URIs that are actually seen in spams than to blacklist IPs or networks. The question then becomes how to get them listed quickly, and if you see the link I provided you will note that we have a strategy for that which we will be trying RSN: http://www.surbl.org/faq.html#numbered Cheers, Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
Re: Is Bayes Really Necessary?
List Mail User wrote: And adding a URI rule for the completewhois list (basically the same function as the no longer existing ipwhois.rfc-ignorant.org list) will hit yet more name servers and spammer IPs with slightly fewer FPs (no issue with escalations). The list is: combined-HIB.dnsiplists.completewhois.com This works miracles except that their uptime and speed is not always top. If we could get more mirrors going it would be a valuable addition. Alex
Re: Is Bayes Really Necessary?
>... > >On Friday, June 3, 2005, 12:33:26 AM, Duncan Hill wrote: >> On Friday 03 June 2005 08:10, Loren Wilton typed: >>> It was basically "the spammer makes a zillion new domains, and they all >>> take time to get into SURBL, so some spam gets through. But they all point >>> to the same dotted quad, and I can match on that lookup". >>> >>> If that statement is true, perhaps the surbl lists could automatically >>> include the dotquads for hosts that are known to be pure spam sources and >>> not mixed systems. Then the client could get the ip for a suspect hostname >>> and see if it matched a known spam dotquad. > >> I'd swear this came up before. The one (slight?) problem with this tactic >> is >> that you can have too many FPs if a spammer targets a legit hosting >> operation. > >Exactly. Listing resolved IPs magnifies the problems with false >positives, joe jobs and collateral damage. Please see: > > http://www.surbl.org/faq.html#numbered > >"Are there plans to offer an RBL list with the domain names >resolved into IP addresses?" > >> Postifx does have a neat restriction to reject based on the IP address of >> the >> name server. You run the same risk, but I've noticed that the pr1ces, al1v3 >> and so on spammer has used the same NS servers for each one > >Using sbl.spamhaus.org with uridnsbl in SA3 does something >similar. SBL has many spammer nameservers listed in it and >uridnsbl checks a URI's nameservers against SBL. It tends >to detect many spamy domains that way (and occasionally a few >relatively innocent bystanders). > >Jeff C. >-- >Jeff Chan >mailto:[EMAIL PROTECTED] >http://www.surbl.org/ > > And adding a URI rule for the completewhois list (basically the same function as the no longer existing ipwhois.rfc-ignorant.org list) will hit yet more name servers and spammer IPs with slightly fewer FPs (no issue with escalations). The list is: combined-HIB.dnsiplists.completewhois.com Paul Shupak [EMAIL PROTECTED] P.S. And if you can afford many more FPs, you can use SPEWS L1 with a low score (catches far more than the other two combined, but has serious issues with "escalations" and "innocent bystanders").
RE: Is Bayes Really Necessary?
>-Original Message- >From: Loren Wilton [mailto:[EMAIL PROTECTED] >Sent: Friday, June 03, 2005 6:47 AM >To: Duncan Hill; users@spamassassin.apache.org >Subject: Re: Is Bayes Really Necessary? > > >>> If that statement is true, perhaps the surbl lists could >automatically >>> include the dotquads for hosts that are known to be >pure spam sources and >>> not mixed systems. Then the client could get the ip for a >suspect hostname >>> and see if it matched a known spam dotquad. > >> I'd swear this came up before. The one (slight?) problem >with this tactic is >> that you can have too many FPs if a spammer targets a legit hosting >> operation. > >I think there was a failure to read all the words in my >original post. > >I quite specifically suggested that listing ips should be >limited to hosts that are known to be pure spam >sources. If the host is KNOWN to be purely spam >(ie: it is owned and run by the spammer), I fail completely to >see how matching on the known IP for that host can either >target or hit innocent bystanders; or indeed bystanders of any sort. > >It might be argued that making the determination that a host >is a pure spam host could be hard. This may well be true. >But despite that, I'd bet that Jeff or Chris could probably >list off a dozen or hundred or so hosts that they know quite >well serve nothing except spammer domains. I fail completely >to see how matching on the ip for these known hosts can do >anything but good, assuming the ip lookup is limited to the >resolved ips of urls found in the spam. > >Loren Loren is correct. And Jeff and I have had this conversation many times. Jeff would rather not risk the FPs by doing it. I can see his point. But I agree with Loren that we have IPs that are pure spam. But we watch those on the backend like Loren said. Getting more automated as well. So rather then do the extra processing up front, our research just pays more attention to those 'pure evil' hosts. Which is one of the reasons the domains fall into balck.uribl.com so fast. I won't release the list of IPs I have now. Not yet anyway. Don't want them to move :) Chris Santerre System Admin and SARE/URIBL Ninja http://www.rulesemporium.com http://www.uribl.com
Re: Is Bayes Really Necessary?
>> If that statement is true, perhaps the surbl lists could automatically >> include the dotquads for hosts that are known to be pure spam >> sources and >> not mixed systems. Then the client could get the ip for a suspect hostname >> and see if it matched a known spam dotquad. > I'd swear this came up before. The one (slight?) problem with this tactic is > that you can have too many FPs if a spammer targets a legit hosting > operation. I think there was a failure to read all the words in my original post. I quite specifically suggested that listing ips should be limited to hosts that are known to be pure spam sources. If the host is KNOWN to be purely spam (ie: it is owned and run by the spammer), I fail completely to see how matching on the known IP for that host can either target or hit innocent bystanders; or indeed bystanders of any sort. It might be argued that making the determination that a host is a pure spam host could be hard. This may well be true. But despite that, I'd bet that Jeff or Chris could probably list off a dozen or hundred or so hosts that they know quite well serve nothing except spammer domains. I fail completely to see how matching on the ip for these known hosts can do anything but good, assuming the ip lookup is limited to the resolved ips of urls found in the spam. Loren
Re: Is Bayes Really Necessary?
On Friday, June 3, 2005, 12:33:26 AM, Duncan Hill wrote: > On Friday 03 June 2005 08:10, Loren Wilton typed: >> It was basically "the spammer makes a zillion new domains, and they all >> take time to get into SURBL, so some spam gets through. But they all point >> to the same dotted quad, and I can match on that lookup". >> >> If that statement is true, perhaps the surbl lists could automatically >> include the dotquads for hosts that are known to be pure spam sources and >> not mixed systems. Then the client could get the ip for a suspect hostname >> and see if it matched a known spam dotquad. > I'd swear this came up before. The one (slight?) problem with this tactic is > that you can have too many FPs if a spammer targets a legit hosting > operation. Exactly. Listing resolved IPs magnifies the problems with false positives, joe jobs and collateral damage. Please see: http://www.surbl.org/faq.html#numbered "Are there plans to offer an RBL list with the domain names resolved into IP addresses?" > Postifx does have a neat restriction to reject based on the IP address of the > name server. You run the same risk, but I've noticed that the pr1ces, al1v3 > and so on spammer has used the same NS servers for each one Using sbl.spamhaus.org with uridnsbl in SA3 does something similar. SBL has many spammer nameservers listed in it and uridnsbl checks a URI's nameservers against SBL. It tends to detect many spamy domains that way (and occasionally a few relatively innocent bystanders). Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
Re: Is Bayes Really Necessary?
On Friday 03 June 2005 08:10, Loren Wilton typed: > It was basically "the spammer makes a zillion new domains, and they all > take time to get into SURBL, so some spam gets through. But they all point > to the same dotted quad, and I can match on that lookup". > > If that statement is true, perhaps the surbl lists could automatically > include the dotquads for hosts that are known to be pure spam sources and > not mixed systems. Then the client could get the ip for a suspect hostname > and see if it matched a known spam dotquad. I'd swear this came up before. The one (slight?) problem with this tactic is that you can have too many FPs if a spammer targets a legit hosting operation. Postifx does have a neat restriction to reject based on the IP address of the name server. You run the same risk, but I've noticed that the pr1ces, al1v3 and so on spammer has used the same NS servers for each one
Re: Is Bayes Really Necessary?
> SURBLs on the other hand have mostly domain names with a few IPs. > Whatever appears in URI host portions is what goes into SURBLs. > Usually URIs have domain names so that's what most of the SURBL > records are. Jeff, the OP (or someone) had an interesting idea, I thought. It was basically "the spammer makes a zillion new domains, and they all take time to get into SURBL, so some spam gets through. But they all point to the same dotted quad, and I can match on that lookup". If that statement is true, perhaps the surbl lists could automatically include the dotquads for hosts that are known to be pure spam sources and not mixed systems. Then the client could get the ip for a suspect hostname and see if it matched a known spam dotquad. Possibly this would want to be a separate list. Alternately, it might want to be possible 'backend processing' inside surbl itself. For instance, you could run your own caching dns. Any hostname lookup request not matching the current list (or the whitelist) gets looked up. If the ip address matches that of a known spam host, it is automatically added to the list and a positive hit is returned to the original requestor. Instant catching of unknown spam domains! Of course with your policies you may simply want to add the domain name to a list for manual review rather than directly including it. Or perhaps establish a new list that is scored deliberately at half the normal surbl score and add it to that list and flag for manual review. If it is spam, it will provide at least some early warning to people receiving it. If it turns out to be a false hit, it will be found in manual review and removed from the list shortly, and in the mean time the low score means no great harm will likely be done. I think this is a concept worth thinking about. Domain names are near infinite, but there is a limit on IPV4 ip addresses; so a lot of domain names must end up mapping to the same ip address in some way or other. This is something that we should be able to exploit. Loren
Re: Is Bayes Really Necessary?
On Thursday, May 26, 2005, 12:49:05 PM, Evan Langlois wrote: > On Thu, 2005-05-26 at 10:42 -0400, Chris Santerre wrote: >> For site wide, I'm pretty much against it. I know people will argue that >> point. I'm obviously biased towards SARE rules updated with RDJ. And the use >> of URIBL.com lists. But these allow a general users, or a sitewide install >> to "set and forget". Which is what we strive for, so SA can be more widley >> excepted. >> >> I have a 99% filter rate without bayes. And I'm proud of that. > I've been testing URIBL and SURBL against just reversing the hostnames > and looking it up on SBL-XBL, SBL and XBL have numeric IP addresses, so they shouldn't match host names. SURBLs on the other hand have mostly domain names with a few IPs. Whatever appears in URI host portions is what goes into SURBLs. Usually URIs have domain names so that's what most of the SURBL records are. Cheers, Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
RE: Is Bayes Really Necessary?
>-Original Message- >From: Jake Colman [mailto:[EMAIL PROTECTED] >Sent: Friday, May 27, 2005 9:47 AM >To: users@spamassassin.apache.org >Subject: Re: Is Bayes Really Necessary? > > > >OK. I misunderstood. The URIBLS are working fine. >Interestingly, although >I use the SARE rules and URIBLS, some spam is still slipping >through. This >spam is fairly obvious spam some I am a bit surprised. Should >I be tweaking >the scoring? > Need an example with header info. --Chris
Re: Is Bayes Really Necessary?
OK. I misunderstood. The URIBLS are working fine. Interestingly, although I use the SARE rules and URIBLS, some spam is still slipping through. This spam is fairly obvious spam some I am a bit surprised. Should I be tweaking the scoring? > "MK" == Matt Kettler <[EMAIL PROTECTED]> writes: MK> Jake Colman wrote: >>> "CS" == Chris Santerre <[EMAIL PROTECTED]> writes: >> CS> If you are using SA 3.x, support is already included. You simply have CS> to create the config file, restart spamd, and *poof* way less spam. >> CS> Net::Dns is required. I forget which version. I forget a lot of CS> stuff. What was the question? >> >> Chris, >> >> Now I'm confused. The usage page on the site says to create a simple .cf >> file containing a number of lines. Is that it? If I have that .cf file in >> my /etc/mail/spamassassin directory it will all simply work? >> ...Jake >> MK> Jake, that "simple cf file" *should* already included by default with SA 3.0.x. MK> You really shouldn't have to create a config file, or do anything at all to get MK> URIBL's going. MK> http://www.surbl.org/ mentions suggestions about adding rules, but most of the MK> surbl lists are already built into SA 3.0. The only one that's missing is the JP MK> list, which came on-line to late to make it into the 3.0 release. Add it if you MK> want, but do so AFTER you get the built-in ones going. MK> If the URIBLs aren't going, check these two things: MK> 1) check to make sure you have /etc/mail/spamassassin/init.pre. Some MK> distribution packages left this file out when they converted the tarball (oops) MK> Without the init.pre, the plugin for URIBL's doesn't get loaded. MK> It should have this statement in it to support URIBLs: MK> loadplugin Mail::SpamAssassin::Plugin::URIDNSBL >> Yes, I have Net::DNS since I am already doing all the other net checks. >> MK> 2) Just because your copy of Net::DNS works for RBLs does not mean it will work MK> for the URIBLs. You need a higher version of Net::DNS to support URIBLs than you MK> need for normal net checks. MK> Check spamassassin --lint -D to see if it's complaining about the version of MK> Net::DNS. -- Jake Colman Sr. Applications Developer Principia Partners LLC Harborside Financial Center 1001 Plaza Two Jersey City, NJ 07311 (201) 209-2467 www.principiapartners.com
Re: Is Bayes Really Necessary?
From: "David B Funk" <[EMAIL PROTECTED]> > As spammers are constantly mutating and adapting, having a dynamic, > adaptive component of SA is a must to avoid the "saw-tooth" effect. > (a fresh SA install works great, gradually loses effectiveness until a > new update install, and so on). Um, yeah, you make an fresh install with no SARE rules and its REALLY bad. It saw tooths upwards as you break down and install more SARE rules. Then a periodic update keeps you up there quite nicely. Seriously, I was AMAZED at how bad a raw 3.02 install was here until I put in the SARE rules, even after I got the Bayes trained. (Did that right away off my saved ham and spam database.) {^_-}
Re: Is Bayes Really Necessary?
From: "Jim Maul" <[EMAIL PROTECTED]> > Gotta stop smokin the green ;) Yeah, it's better if you shovel the random greens you find into the compost pit. Not many people will look for them in a compost pit when they get reported as missing persons. {O,o}
Re: Is Bayes Really Necessary?
From: "List Mail User" <[EMAIL PROTECTED]> > Though nobody seems to have said it exactly this way: It seems > to be becoming very obvious that the people who say the have problems > with Bayes are those who support a diverse group of users (e.g. ISPs > and email providers) and those who find it works well, even with autolearning > are those with either small numbers of users or users who are mostly of > a very specific categorization type (e.g. medical, legal, technical, or > just about any homogenous group). I suspect you are right, Paul. And I restrict the group a little farther to suggest it is large ISPs with diverse customer bases and global Bayes who have the most trouble. Per user Bayes, a good set of SARE rules, and significantly widened autolearn thresholds from base install levels may be their solution. Global Bayes is probably the ISP poison proposition. And autolearn with normal thresholds is probably further poison. But then, I run manual learn, private Bayes, and LOTS of rules. (40 sets of SARE rules plus my own largish set of rules that apply to me but not others works nicely along with the private Bayes) {^_-}
Re: Is Bayes Really Necessary?
From: "Matt Kettler" <[EMAIL PROTECTED]> (Sneaky one you are - you got around my Reply-To markup for this list. For that you get an extra copy. {^_-}) > jdow wrote: > > One way to keep Bayes from running is to never train it. > > {^_^} > > You'd also disable autolearning. By default SA will eventually autolearn enough > email to being using bayes. (and often these pure auto-learn only DBs end up > with very bad results.) I said what you could do. I left how as an exercise for the student. I figure if he tries without Bayes for awhile (kill all training and move the bayes database into a corner somewhere that SA cannot find) he may find his one true answer for his question. {^_-} <- Self has determined for her situation Bayes is necessary.
Re: Is Bayes Really Necessary?
Jake Colman wrote: >>"CS" == Chris Santerre <[EMAIL PROTECTED]> writes: > >CS> If you are using SA 3.x, support is already included. You simply have >CS> to create the config file, restart spamd, and *poof* way less spam. > >CS> Net::Dns is required. I forget which version. I forget a lot of >CS> stuff. What was the question? > > Chris, > > Now I'm confused. The usage page on the site says to create a simple .cf > file containing a number of lines. Is that it? If I have that .cf file in > my /etc/mail/spamassassin directory it will all simply work? > ...Jake > Jake, that "simple cf file" *should* already included by default with SA 3.0.x. You really shouldn't have to create a config file, or do anything at all to get URIBL's going. http://www.surbl.org/ mentions suggestions about adding rules, but most of the surbl lists are already built into SA 3.0. The only one that's missing is the JP list, which came on-line to late to make it into the 3.0 release. Add it if you want, but do so AFTER you get the built-in ones going. If the URIBLs aren't going, check these two things: 1) check to make sure you have /etc/mail/spamassassin/init.pre. Some distribution packages left this file out when they converted the tarball (oops) Without the init.pre, the plugin for URIBL's doesn't get loaded. It should have this statement in it to support URIBLs: loadplugin Mail::SpamAssassin::Plugin::URIDNSBL > Yes, I have Net::DNS since I am already doing all the other net checks. > 2) Just because your copy of Net::DNS works for RBLs does not mean it will work for the URIBLs. You need a higher version of Net::DNS to support URIBLs than you need for normal net checks. Check spamassassin --lint -D to see if it's complaining about the version of Net::DNS.
Re: Is Bayes Really Necessary?
> "CS" == Chris Santerre <[EMAIL PROTECTED]> writes: >> I already use RDJ and the automatic updater. How do I use URIBL? I >> looked at the usage page and I undersyand that I need to create a .cf >> file but how does it access the lists? CS> If you are using SA 3.x, support is already included. You simply have CS> to create the config file, restart spamd, and *poof* way less spam. CS> Net::Dns is required. I forget which version. I forget a lot of CS> stuff. What was the question? Chris, Now I'm confused. The usage page on the site says to create a simple .cf file containing a number of lines. Is that it? If I have that .cf file in my /etc/mail/spamassassin directory it will all simply work? Yes, I have Net::DNS since I am already doing all the other net checks. ...Jake -- Jake Colman Sr. Applications Developer Principia Partners LLC Harborside Financial Center 1001 Plaza Two Jersey City, NJ 07311 (201) 209-2467 www.principiapartners.com
Re: Is Bayes Really Necessary?
On Thu, 26 May 2005, Thomas Cameron wrote: > On Thu, 2005-05-26 at 10:08 -0400, Jake Colman wrote: > > Given the rather complete set of rules that ship with SA and which can > > expanded with SARE, does bayes learning really help? Won't the rules catch > > pretty much everything anyway? > > I have used SA with Bayes and it took quite a bit of administrative > overhead. It worked amazingly well, though. > > I now run SA with DCC, Razor, Pyzor and network checks and without Bayes > and it still Just Works(TM). Seriously - I have customers who slather You could make the argument that Razor, Pyzor, etc perform a similar function to Bayes (analyze a message, generate some kind of 'collapsed' representation, compare it with a database of known messages and come up with a "spammyness" value). As spammers are constantly mutating and adapting, having a dynamic, adaptive component of SA is a must to avoid the "saw-tooth" effect. (a fresh SA install works great, gradually loses effectiveness until a new update install, and so on). Bayes has the advantage that it's local, no network overhead, can be trained to 'know' your specific kinds of messages. Bayes has the disadvantage that it's your local responsibility to see that it's trained properly. -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
RE: Is Bayes Really Necessary?
On Thu, 2005-05-26 at 10:42 -0400, Chris Santerre wrote: > For site wide, I'm pretty much against it. I know people will argue that > point. I'm obviously biased towards SARE rules updated with RDJ. And the use > of URIBL.com lists. But these allow a general users, or a sitewide install > to "set and forget". Which is what we strive for, so SA can be more widley > excepted. > > I have a 99% filter rate without bayes. And I'm proud of that. I've been testing URIBL and SURBL against just reversing the hostnames and looking it up on SBL-XBL, and I can say that URIBL and SURBL don't catch nearly the number of spams. I get close to a 99% filter rate just checking the links alone.
Re: Is Bayes Really Necessary?
Chris Santerre wrote: -Original Message- From: Jake Colman [mailto:[EMAIL PROTECTED] Sent: Thursday, May 26, 2005 2:54 PM To: users@spamassassin.apache.org Subject: Re: Is Bayes Really Necessary? "CS" == Chris Santerre <[EMAIL PROTECTED]> writes: >> -Original Message- >> From: Jake Colman [mailto:[EMAIL PROTECTED] >> Sent: Thursday, May 26, 2005 10:09 AM >> To: users@spamassassin.apache.org >> Subject: Is Bayes Really Necessary? >> >> >> >> Given the rather complete set of rules that ship with SA and which can >> expanded with SARE, does bayes learning really help? Won't >> the rules catch >> pretty much everything anyway? CS> Oh my favorite subject!!! :) CS> NO! Bayes is not necessary. IMHO, for personal use, it is incredible. But I CS> feel the care of it is more difficult then your average user would care to CS> keep up. CS> For site wide, I'm pretty much against it. I know people will argue that CS> point. I'm obviously biased towards SARE rules updated with RDJ. And the use CS> of URIBL.com lists. But these allow a general users, or a sitewide install CS> to "set and forget". Which is what we strive for, so SA can be more widley CS> excepted. CS> I have a 99% filter rate without bayes. And I'm proud of that. CS> Chris Santerre CS> System Admin and SARE/URIBL Ninja CS> http://www.rulesemporium.com CS> http://www.uribl.com I already use RDJ and the automatic updater. How do I use URIBL? I looked at the usage page and I undersyand that I need to create a .cf file but how does it access the lists? If you are using SA 3.x, support is already included. You simply have to create the config file, restart spamd, and *poof* way less spam. Net::Dns is required. I forget which version. I forget a lot of stuff. What was the question? --Chris Gotta stop smokin the green ;) -Jim
RE: Is Bayes Really Necessary?
>-Original Message- >From: Jake Colman [mailto:[EMAIL PROTECTED] >Sent: Thursday, May 26, 2005 2:54 PM >To: users@spamassassin.apache.org >Subject: Re: Is Bayes Really Necessary? > > >>>>>> "CS" == Chris Santerre <[EMAIL PROTECTED]> writes: > > >> -Original Message- > >> From: Jake Colman [mailto:[EMAIL PROTECTED] > >> Sent: Thursday, May 26, 2005 10:09 AM > >> To: users@spamassassin.apache.org > >> Subject: Is Bayes Really Necessary? > >> > >> > >> > >> Given the rather complete set of rules that ship with SA >and which can > >> expanded with SARE, does bayes learning really help? Won't > >> the rules catch > >> pretty much everything anyway? > > CS> Oh my favorite subject!!! :) > > CS> NO! Bayes is not necessary. IMHO, for personal use, it >is incredible. But I > CS> feel the care of it is more difficult then your average >user would care to > CS> keep up. > > CS> For site wide, I'm pretty much against it. I know >people will argue that > CS> point. I'm obviously biased towards SARE rules updated >with RDJ. And the use > CS> of URIBL.com lists. But these allow a general users, or >a sitewide install > CS> to "set and forget". Which is what we strive for, so SA >can be more widley > CS> excepted. > > CS> I have a 99% filter rate without bayes. And I'm proud of that. > > CS> Chris Santerre > CS> System Admin and SARE/URIBL Ninja > CS> http://www.rulesemporium.com > CS> http://www.uribl.com > >I already use RDJ and the automatic updater. How do I use >URIBL? I looked >at the usage page and I undersyand that I need to create a .cf >file but how >does it access the lists? If you are using SA 3.x, support is already included. You simply have to create the config file, restart spamd, and *poof* way less spam. Net::Dns is required. I forget which version. I forget a lot of stuff. What was the question? --Chris
Re: Is Bayes Really Necessary?
> "CS" == Chris Santerre <[EMAIL PROTECTED]> writes: >> -Original Message- >> From: Jake Colman [mailto:[EMAIL PROTECTED] >> Sent: Thursday, May 26, 2005 10:09 AM >> To: users@spamassassin.apache.org >> Subject: Is Bayes Really Necessary? >> >> >> >> Given the rather complete set of rules that ship with SA and which can >> expanded with SARE, does bayes learning really help? Won't >> the rules catch >> pretty much everything anyway? CS> Oh my favorite subject!!! :) CS> NO! Bayes is not necessary. IMHO, for personal use, it is incredible. But I CS> feel the care of it is more difficult then your average user would care to CS> keep up. CS> For site wide, I'm pretty much against it. I know people will argue that CS> point. I'm obviously biased towards SARE rules updated with RDJ. And the use CS> of URIBL.com lists. But these allow a general users, or a sitewide install CS> to "set and forget". Which is what we strive for, so SA can be more widley CS> excepted. CS> I have a 99% filter rate without bayes. And I'm proud of that. CS> Chris Santerre CS> System Admin and SARE/URIBL Ninja CS> http://www.rulesemporium.com CS> http://www.uribl.com I already use RDJ and the automatic updater. How do I use URIBL? I looked at the usage page and I undersyand that I need to create a .cf file but how does it access the lists? -- Jake Colman Sr. Applications Developer Principia Partners LLC Harborside Financial Center 1001 Plaza Two Jersey City, NJ 07311 (201) 209-2467 www.principiapartners.com
Re: Is Bayes Really Necessary?
On Thursday May 26 2005 1:13 pm, Loren Wilton wrote: > > Given the rather complete set of rules that ship with SA and which can > > expanded with SARE, does bayes learning really help? Won't the rules > > catch > > > pretty much everything anyway? > > Um, maybe, maybe not. > > Bayes *necessary*? No, especially if you run net tests. > Bayes *highly desirable*? Yup. An additional 4 points can really help > when a new spam shows up that you don't have a lot of rules for. > > Loren Loren's point well taken. I think it's the use of bayes in conjunction with other rules that tends to work best. At least, that's my experience. Dimitri
Re: Is Bayes Really Necessary?
> Given the rather complete set of rules that ship with SA and which can > expanded with SARE, does bayes learning really help? Won't the rules catch > pretty much everything anyway? Um, maybe, maybe not. Bayes *necessary*? No, especially if you run net tests. Bayes *highly desirable*? Yup. An additional 4 points can really help when a new spam shows up that you don't have a lot of rules for. Loren
Re: Is Bayes Really Necessary?
Though nobody seems to have said it exactly this way: It seems to be becoming very obvious that the people who say the have problems with Bayes are those who support a diverse group of users (e.g. ISPs and email providers) and those who find it works well, even with autolearning are those with either small numbers of users or users who are mostly of a very specific categorization type (e.g. medical, legal, technical, or just about any homogenous group). Despite the oft repeated cleam spammers are dumb, not all are; And the "Bayes poison" we all see added to spam must work for some group, and I would guess that it is exactly those users who have the diverse user bases and have primarily "personal conversational" content in lots of the email running through their systems. For me, the few times I see Bayes give apparent wrong answers is in email from friends and family, and never from clients or technical contacts. (and it is certainly worse that many members of my family have spent their entire careers in marketing - they often get Bayes_80 corse when writing me). This lends support to the notion that the added text does indeed match some types of common communication. If my supposition is correct, the question then becomes: Can using personal (i.e. per user) Bayes overcome the problems which some users/sites see? I'm not sure how to test this - certainly I couldn't myself, but maybe some of the other members of this list are able to and could try. Even if it does work, the resource load may be too high to be reasonable for many large sites. Paul Shupak [EMAIL PROTECTED]
Re: Is Bayes Really Necessary?
On 5/26/2005 10:08 AM, Jake Colman wrote: > Given the rather complete set of rules that ship with SA and which can > expanded with SARE, does bayes learning really help? Won't the rules catch > pretty much everything anyway? The base SA install is insufficient, but if you tweak the scores and add some additional tests, you can get by without bayes just fine. I use a select set of RBLs, Razor, rulesets from rulesemporium, and my own LDAP-based weighting plugin, and my highest spam only gets an average of one spam per day, and even those are over the 5.0 threshold (so they are auto-filed into the Junk Email folder). Bayes is great for per-user stuff, but unless you are willing to manage the per-user databases (which I'm not), it is easier to just tweak the system scores and rules. Less management overhead, less CPU, etc. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Is Bayes Really Necessary?
Matt Kettler wrote: jdow wrote: One way to keep Bayes from running is to never train it. {^_^} You'd also disable autolearning. By default SA will eventually autolearn enough email to being using bayes. (and often these pure auto-learn only DBs end up with very bad results.) Often is the keyword here. I guess im the exception to that norm ;) But then again, i altered my autolearn thresholds to -0.1 ham/12.0 spam. I believe this is key to correctly use autolearning. (i dont mean these numbers specifically, just the concept). -Jim
Re: Is Bayes Really Necessary?
jdow wrote: > One way to keep Bayes from running is to never train it. > {^_^} You'd also disable autolearning. By default SA will eventually autolearn enough email to being using bayes. (and often these pure auto-learn only DBs end up with very bad results.)
Re: Is Bayes Really Necessary?
One way to keep Bayes from running is to never train it. {^_^} - Original Message - From: "Kristopher Austin" <[EMAIL PROTECTED]> We have found Bayes to be more trouble than it's worth. We were frequently running into problems keeping the database stable and fresh. We have a site-wide install so that just made it all the more problematic. It definitely depends on your situation. I don't think anyone can make a blanket statement one way or the other. We have had great success without Bayes and the amount of admin time necessary to keep SA running has dropped significantly. Kris -Original Message- From: Jake Colman [mailto:[EMAIL PROTECTED] Sent: Thursday, May 26, 2005 9:09 AM To: users@spamassassin.apache.org Subject: Is Bayes Really Necessary? Given the rather complete set of rules that ship with SA and which can expanded with SARE, does bayes learning really help? Won't the rules catch pretty much everything anyway? -- Jake Colman
Re: Is Bayes Really Necessary?
* Jim Maul <[EMAIL PROTECTED]>: > I have been running sitewide bayes since the beginning without much > maintenance at all. It has autolearned every message itself and its > dead on balls accurate. I've trained maybe 20 message total manually so > i dont see how running bayes could actually cause more work for an admin > unless its been trained poorly and they have to correct it. I also train it manually with all the spam that slips through (and some ham as well, to keep the balance). -- Ralf Hildebrandt (i.A. des IT-Zentrums) [EMAIL PROTECTED] Charite - Universitätsmedizin BerlinTel. +49 (0)30-450 570-155 Gemeinsame Einrichtung von FU- und HU-BerlinFax. +49 (0)30-450 570-962 IT-Zentrum Standort CBF send no mail to [EMAIL PROTECTED]
Re: Is Bayes Really Necessary?
Ralf Hildebrandt wrote: * Kristopher Austin <[EMAIL PROTECTED]>: We have found Bayes to be more trouble than it's worth. We were frequently running into problems keeping the database stable and fresh. We have a site-wide install so that just made it all the more problematic. We also have a site-wide install with Bayes (15.000 Users). Where is the problem with "keeping the database stable and fresh"? Never crashed here. I have been running sitewide bayes since the beginning without much maintenance at all. It has autolearned every message itself and its dead on balls accurate. I've trained maybe 20 message total manually so i dont see how running bayes could actually cause more work for an admin unless its been trained poorly and they have to correct it. Even then its probably just easier to delete it and start over. I tag spam at 5.0 and have bayes BAYES_99 at 5.4. This one rule alone is enough to mark spam and i havent had any false positives because of it yet. -Jim
Re: Is Bayes Really Necessary?
Joe Zitnik wrote: Bayes definitely helps, but auto-learn can cause problems. Perhaps a better question would be, "Is autolearn really neccessary?" I think the problems mostly come from accidentally autolearning spam as ham, which is easy with the default threshold. Autolearning messages as spam at a reasonable threshold should be okay. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC
Re: Is Bayes Really Necessary?
I have autolearn off. I have been burned by it twice.>>> <[EMAIL PROTECTED]> 5/26/2005 10:33 AM >>> On Thu, 26 May 2005, Joe Zitnik wrote:> I think points can be made for both sides of the argument. The thing> that makes bayes different, is that a well trained bayes database is> specific to your environment. If you're a law firm, your learned ham is> going to be heavy in legalese, medical related org, heavy in that> terminology. Because spam and ham is learned specific to your> environment, it can make a big difference.>> >>> Jake Colman <[EMAIL PROTECTED]> 5/26/2005 10:08 AM > Given the rather complete set of rules that ship with SA and which can> expanded with SARE, does bayes learning really help? Won't the rules> catch> pretty much everything anyway?Bayes definitely helps, but auto-learn can cause problems. Perhaps abetter question would be, "Is autolearn really neccessary?"James Smallacombe PlantageNet, Inc. CEO and Janitor[EMAIL PROTECTED] http://3.am=
Re: Is Bayes Really Necessary?
On Thu, 26 May 2005, Joe Zitnik wrote: > I think points can be made for both sides of the argument. The thing > that makes bayes different, is that a well trained bayes database is > specific to your environment. If you're a law firm, your learned ham is > going to be heavy in legalese, medical related org, heavy in that > terminology. Because spam and ham is learned specific to your > environment, it can make a big difference. > > >>> Jake Colman <[EMAIL PROTECTED]> 5/26/2005 10:08 AM >>> > > Given the rather complete set of rules that ship with SA and which can > expanded with SARE, does bayes learning really help? Won't the rules > catch > pretty much everything anyway? Bayes definitely helps, but auto-learn can cause problems. Perhaps a better question would be, "Is autolearn really neccessary?" James Smallacombe PlantageNet, Inc. CEO and Janitor [EMAIL PROTECTED] http://3.am =
RE: Is Bayes Really Necessary?
>-Original Message- >From: Jake Colman [mailto:[EMAIL PROTECTED] >Sent: Thursday, May 26, 2005 10:09 AM >To: users@spamassassin.apache.org >Subject: Is Bayes Really Necessary? > > > >Given the rather complete set of rules that ship with SA and which can >expanded with SARE, does bayes learning really help? Won't >the rules catch >pretty much everything anyway? Oh my favorite subject!!! :) NO! Bayes is not necessary. IMHO, for personal use, it is incredible. But I feel the care of it is more difficult then your average user would care to keep up. For site wide, I'm pretty much against it. I know people will argue that point. I'm obviously biased towards SARE rules updated with RDJ. And the use of URIBL.com lists. But these allow a general users, or a sitewide install to "set and forget". Which is what we strive for, so SA can be more widley excepted. I have a 99% filter rate without bayes. And I'm proud of that. Chris Santerre System Admin and SARE/URIBL Ninja http://www.rulesemporium.com http://www.uribl.com
Re: Is Bayes Really Necessary?
I think points can be made for both sides of the argument. The thing that makes bayes different, is that a well trained bayes database is specific to your environment. If you're a law firm, your learned ham is going to be heavy in legalese, medical related org, heavy in that terminology. Because spam and ham is learned specific to your environment, it can make a big difference.>>> Jake Colman <[EMAIL PROTECTED]> 5/26/2005 10:08 AM >>> Given the rather complete set of rules that ship with SA and which canexpanded with SARE, does bayes learning really help? Won't the rules catchpretty much everything anyway?-- Jake ColmanSr. Applications DeveloperPrincipia Partners LLCHarborside Financial Center1001 Plaza TwoJersey City, NJ 07311(201) 209-2467www.principiapartners.com
Re: Is Bayes Really Necessary?
* Kristopher Austin <[EMAIL PROTECTED]>: > We have found Bayes to be more trouble than it's worth. We were > frequently running into problems keeping the database stable and fresh. > We have a site-wide install so that just made it all the more > problematic. We also have a site-wide install with Bayes (15.000 Users). Where is the problem with "keeping the database stable and fresh"? Never crashed here. -- Ralf Hildebrandt (i.A. des IT-Zentrums) [EMAIL PROTECTED] Charite - Universitätsmedizin BerlinTel. +49 (0)30-450 570-155 Gemeinsame Einrichtung von FU- und HU-BerlinFax. +49 (0)30-450 570-962 IT-Zentrum Standort CBF send no mail to [EMAIL PROTECTED]
Re: Is Bayes Really Necessary?
On Thu, 2005-05-26 at 10:08 -0400, Jake Colman wrote: > Given the rather complete set of rules that ship with SA and which can > expanded with SARE, does bayes learning really help? Won't the rules catch > pretty much everything anyway? I have used SA with Bayes and it took quite a bit of administrative overhead. It worked amazingly well, though. I now run SA with DCC, Razor, Pyzor and network checks and without Bayes and it still Just Works(TM). Seriously - I have customers who slather their e-mail addresses all over Usenet, message boards, on their web pages, etc. They might as well put a big sign up that says SPAM ME PLEASE!!! But they don't get any spam - SA and spamass-milter rejects all of it. It is really amazing - I've got clients who went from hundreds of spams per day down to one or two that slip through per week. Of course, when one gets through, my phone rings! I guess my experience is that either way, SA Just Works(TM). Cheers, Thomas
RE: Is Bayes Really Necessary?
We have found Bayes to be more trouble than it's worth. We were frequently running into problems keeping the database stable and fresh. We have a site-wide install so that just made it all the more problematic. It definitely depends on your situation. I don't think anyone can make a blanket statement one way or the other. We have had great success without Bayes and the amount of admin time necessary to keep SA running has dropped significantly. Kris -Original Message- From: Jake Colman [mailto:[EMAIL PROTECTED] Sent: Thursday, May 26, 2005 9:09 AM To: users@spamassassin.apache.org Subject: Is Bayes Really Necessary? Given the rather complete set of rules that ship with SA and which can expanded with SARE, does bayes learning really help? Won't the rules catch pretty much everything anyway? -- Jake Colman Sr. Applications Developer Principia Partners LLC Harborside Financial Center 1001 Plaza Two Jersey City, NJ 07311 (201) 209-2467 www.principiapartners.com
RE: Is Bayes Really Necessary?
Yes, BAYES is an integral part of SA! It's like a constantly changing rule (without the need to tweak the rule ever so slightly for nuances in the "new" mail. There are mails that don't trip any standard rules, but are caught by bayes alone. Steven -Original Message- From: Jake Colman [mailto:[EMAIL PROTECTED] Sent: Thursday, May 26, 2005 7:09 AM To: users@spamassassin.apache.org Subject: Is Bayes Really Necessary? Given the rather complete set of rules that ship with SA and which can expanded with SARE, does bayes learning really help? Won't the rules catch pretty much everything anyway? -- Jake Colman Sr. Applications Developer Principia Partners LLC Harborside Financial Center 1001 Plaza Two Jersey City, NJ 07311 (201) 209-2467 www.principiapartners.com