One quick thought: It might be good to return a fixed-point value from the validate function. Some validators could use a Bayesian filter (or other technique) that is going to return a probability that a given comment is spam. If may even possible to add (or use a weighted average of) the returned values. If both the Bayesian and the link count filters return reasonably high likelihood of spam, that could have a cumulative effect. Of course, a filter that is "certain" that something is spam could always return "1.000"... You could also have something like:
0 - 0.25  = Publish
0.25 - .75 = Moderate
.75+ = Reject (but possibly archive?)

Users could even adjust the threshold of their filter based upon how much time they are willing to spend moderating.


[A funny aside: A blogger (I think it was Jeff Jarvis) was accused of "censoring" comments about "socialism". He finally realized that the word "socialism" contained the word "Cialis" which is a Viagra-like drug, and was triggering his filter.]

Let's see if the mailing list lets /this/ message through...

-- Sean

Dave wrote:
Currently, we've got a couple of different ways to control comment
spam in Roller.

   * Three levels of blacklist: comments that match blacklist are
marked as spam
         o Built in blacklist: based on old unsupported MT blacklist
         o Site wide blacklist: global admin manages this blacklist
         o Website blacklist: each weblog can define a blacklist

* Comment moderation: when enabled, comments must be approved by blog owner

   * CommentAuthentcator: determines if user is allowed to comment
         o You can plugin your own by implementing the comment
authenticator interface
         o Default authenticator does nothing
         o Math Authenticator presents math question, verifies answer
         o CAPTCHA authenticator is possible too, but we don't ship one

* Comment throttle: IP addresses that send rapid-fire comments are banned

There are problems with each of those methods and even when combined
they're not enough to control spam. We've discussed other ideas for
comment spam control like forcing long comments into moderation,
rejecting comments with too many links and rejecting comments judged
by Akismet to be spam. Those are all good ideas, but if we start
adding special rules ad hoc, we'll end up with a mess.

What we need is way for Roller site administrators to define a chain
of comment validators so that we and others can add comment spam
processing rules, which are then treated in a uniform way in the
Roller comment servlet.

Read the rest here:
http://rollerweblogger.org/wiki/Wiki.jsp?page=Proposal_CommentValidators

Pease respond with comments here on the list.

- Dave


Reply via email to