On Sat, 2007-03-10 at 10:17 -0500, François Pinard wrote: > [Marc Schwartz] > > >The "Human Spam Filter" (aka Martin) [...] > > The R mailing list has, indeed, be remarkably spam-free, and > well-managed so far that I can see. I do hope, however, that Martin > does not have to do the filtering himself -- it would be just daunting! > > In any case, Martin, a lot of thanks from me!
The comment was somewhat "tongue-in-cheek". While a major proportion of spam can be filtered using automated tools, it takes a significant amount of manual effort to configure the tools to achieve the level of cleansing that we observe here. On my system (laptop running FC6 Linux), I am using SpamAssassin with Bayesian filtering enabled, along with remote spam checks such as DCC, Razor, Pyzor and some RBLs. I also recently started using FuzzyOCR (as a plug-in to SA) to enhance the filtering of spam containing only graphic content. These e-mails are of course specifically designed to obviate the utility of text based spam filtering. However, I still get some that come through despite the above. There are also 'borderline' e-mails that require manually running the spam/ham learning scripts. To increase the filtering effectiveness to the level we see here, I would have to spend a fair amount of time writing custom rules for SA and this is where I have no doubt, Martin spends a lot of his time with list management. HTH, Marc Schwartz ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.