Just took a quick sample of 10 instances of vandalism to [[Ted Stevens]].
 Of those 10 instances of vandalism, either 2 or 4 would not have been found
by the automated tool described.  2 if every edit summary containing the
word "vandalism" is counted as vandalism, and 4 if not.  The former would
probably significantly overcount vandalism.

http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=173527553&oldid=173381871
(Removed
vandalism)
http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=180054904&oldid=179982198
(rmv
vandalism)
http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=168486242&oldid=168438600
no
edit summary
http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=162332870&oldid=162038733
(yes
it is funny, but this doesn't belong here)

On Thu, Aug 27, 2009 at 9:31 PM, Thomas Dalton <thomas.dal...@gmail.com>
 wrote:

> 2009/8/28 Anthony <wikim...@inbox.org>:
> > I suggested a better approach last time we had this thread: statistical
> > sampling.
>
> This research was based on a sample. What are you talking about?


I'm talking about taking a sample and examining it manually.  First, spend a
few weeks coming up with an objective definition of vandalism.  Then pick
5,000 random article views from the http log, and publish the URL/date/time.
 Then advertise the list all over the place (especially on sites like
Wikipedia Review) asking people to find instances of vandalism in it.
 People can use automated means which they then go through by hand to remove
false positives, manual error checking, spot checking, whatever.  The number
of confirmed instances of vandalism will grow for a while, and eventually
will start to level off.

May not be perfect, but it'll provide a lower bound on the amount of
vandalism, at least.  Have a statistician tell us what our exact error
bounds are.  And then prepare for a second study, improving on everything
(the definition of "vandalism", the number of random article views, the
amount of time to wait) based on what we learned.
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Reply via email to