Just took a quick sample of 10 instances of vandalism to [[Ted Stevens]]. Of those 10 instances of vandalism, either 2 or 4 would not have been found by the automated tool described. 2 if every edit summary containing the word "vandalism" is counted as vandalism, and 4 if not. The former would probably significantly overcount vandalism.
http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=173527553&oldid=173381871 (Removed vandalism) http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=180054904&oldid=179982198 (rmv vandalism) http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=168486242&oldid=168438600 no edit summary http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=162332870&oldid=162038733 (yes it is funny, but this doesn't belong here) On Thu, Aug 27, 2009 at 9:31 PM, Thomas Dalton <thomas.dal...@gmail.com> wrote: > 2009/8/28 Anthony <wikim...@inbox.org>: > > I suggested a better approach last time we had this thread: statistical > > sampling. > > This research was based on a sample. What are you talking about? I'm talking about taking a sample and examining it manually. First, spend a few weeks coming up with an objective definition of vandalism. Then pick 5,000 random article views from the http log, and publish the URL/date/time. Then advertise the list all over the place (especially on sites like Wikipedia Review) asking people to find instances of vandalism in it. People can use automated means which they then go through by hand to remove false positives, manual error checking, spot checking, whatever. The number of confirmed instances of vandalism will grow for a while, and eventually will start to level off. May not be perfect, but it'll provide a lower bound on the amount of vandalism, at least. Have a statistician tell us what our exact error bounds are. And then prepare for a second study, improving on everything (the definition of "vandalism", the number of random article views, the amount of time to wait) based on what we learned. _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l