Rufus Pollock wrote: > CKAN is getting more 'backhanded compliments' in the form of spamming: > > http://www.ckan.net/revision/ > http://www.ckan.net/revision/read/1978 >
How charming. :-) > Admins can keep purging this by hand [1] but this isn't hugely > efficient. In fact around christmas to deal with a bad repeated attack I > implemented some more sophisticated support ([2],[3]) including > blacklisting but didn't fully integrate this into controllers. Before we > go ahead and do more I wonder if anyone else has comments as to how best > to deal with spam on 'world-editable' systems. In particular what are > people's views on the effectiveness (and cost of implementation) of > things like: > > * captchas (and texchas) > Very effective for screening out machine clients. > * ip blacklisting > Not so effective over time. You just keep adding addresses, and they just keep using new ones. Stops persistent morons, but how many of those are there? :-) > * bayesian spam filtering of some kind > Makes a strong contribution, with inevitable rate of error. Could work off the blog spamming databases? Could be used to selectively prompt for moderator approval. One could think about this all day, and I'm sure the list is endless, as it's effectively an arms race. However... It appears that you win decisively by having a lot of people watching the recent changes list all the time. You can also reduce the incentives, for example by retarding what search engines get to see by 12 hours, so you have a chance to remove the spam before it gets indexed (the CKAN spammer's goal, probably). You might not do this for trusted users, i.e. users who have made some changes and have not added any spam. I would think there would be many variations on the retarding trick. Also included, not leaving the spam as a visible revision; CKAN does at the moment, right? You could also have different operating modes, one might allow you to be alerted about and then approve contributions. You might want to have somebody cover this when you are away, or make this an OKF officer position (humans are fairly good at classifying spam)? You might want to be able to switch to "anything goes", or to a "lock-down" mode in case it becomes under attack or if you simply feel snowed under by a steady stream of spam, or if you need to leave it unattended for a period of time? There would need to be some kind of catch-up notifications, so contributors know when things have gone through, or the site is open again. I suppose this is called moderation. :-) Given that CKAN is a collection, it might also be called curation. You can get Kreative, so you might approve changes that are made without an OpenID. Or when there are more than two changes. Or always, unless they are known and trusted. Or changes by somebody other than the original registrant. Or when the contribution is classified as spam by a pattern recognition function. Lots of possibilities, some may do more harm than good, and doing lots of things would probably be a bad thing. Overall, the main analysis would be to frustrate the ways in which CKAN is actually spammed (whilst not destroying the purpose and openness). And combine this with flexibility, so you can regulate access according to prevailing conditions, much like a mailing list/blog/wiki. I would guess that the CKAN spam is mostly occasional, by a human client, and is intended to be picked up by Google et al. So I would: 1. forget about captchas, 2. think about snagging for approval by a moderator contributions which look like spam to an automaton, 3. add a "this as spam" button which prevents anybody from seeing the spam from that point forward (maybe you need to support people applying to see hidden spam, so there is no suspicion about censoring real content, maybe obfuscated from search engines by Javascript), 4. maybe combine this with search engine detection and presentation to them of a retarded view of the registry. That shouldn't cost too much, would cut out most of the spam, and would cut out most of the admin work. It wouldn't have become a "moderated" resource, just "actively monitored". How much does purging CKAN spam by hand "cost" the OKF? Is most of the spam light and irregular? Best wishes, John. > ~rufus > > [1]:<http://lists.okfn.org/pipermail/okfn-help/2007-October/000038.html> > [2]:<http://knowledgeforge.net/ckan/trac/changeset/205> > [3]:<http://knowledgeforge.net/ckan/trac/changeset/202> > _______________________________________________ okfn-discuss mailing list [email protected] http://lists.okfn.org/cgi-bin/mailman/listinfo/okfn-discuss
