The idea to have someething like this as a nutch-module (dropping pages or ranking them very low) might come up :-)
This will be a very long way.I collect some thoughts and a list of web spam related papers in my blog. http://www.find23.net/Web-Site/blog/521BA1CD-14C4-4E84-A072- F98E13CAEFE1.html
Feedback is welcome. Stefan