On Thu, Aug 27, 2009 at 9:47 PM, Anthony <wikim...@inbox.org> wrote: > Just took a quick sample of 10 instances of vandalism to [[Ted Stevens]]. > Of those 10 instances of vandalism, either 2 or 4 would not have been > found > by the automated tool described. 2 if every edit summary containing the > word "vandalism" is counted as vandalism, and 4 if not. The former would > probably significantly overcount vandalism. > > > http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=173527553&oldid=173381871 > (Removed > vandalism) > > http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=180054904&oldid=179982198 > (rmv > vandalism) > > http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=168486242&oldid=168438600 > no > edit summary > > http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=162332870&oldid=162038733 > (yes > it is funny, but this doesn't belong here) > > On Thu, Aug 27, 2009 at 9:31 PM, Thomas Dalton <thomas.dal...@gmail.com> > wrote: > > > 2009/8/28 Anthony <wikim...@inbox.org>: > > > I suggested a better approach last time we had this thread: statistical > > > sampling. > > > > This research was based on a sample. What are you talking about? > > > I'm talking about taking a sample and examining it manually. First, spend > a > few weeks coming up with an objective definition of vandalism. Then pick > 5,000 random article views from the http log, and publish the > URL/date/time. > Then advertise the list all over the place (especially on sites like > Wikipedia Review) asking people to find instances of vandalism in it. > People can use automated means which they then go through by hand to > remove > false positives, manual error checking, spot checking, whatever. The > number > of confirmed instances of vandalism will grow for a while, and eventually > will start to level off. > > May not be perfect, but it'll provide a lower bound on the amount of > vandalism, at least. Have a statistician tell us what our exact error > bounds are. And then prepare for a second study, improving on everything > (the definition of "vandalism", the number of random article views, the > amount of time to wait) based on what we learned. >
Out of curiosity, Anthony, do you still refrain from editing Wikimedia projects over licensing issues? How long has it been, a year? Nathan _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l