On Fri, Aug 28, 2009 at 12:43 AM, Brion Vibber <br...@wikimedia.org> wrote:
> On 8/27/09 9:39 PM, Thomas Dalton wrote: > > 2009/8/28 Gregory Maxwell<gmaxw...@gmail.com>: > >> If the results of this kind of study have good agreement with > >> mechanical proxy metrics (such as machine detected vandalism) our > >> confidence in those proxies will increase, if they disagree it will > >> provide an opportunity to improve the proxies. > > > > This kind of intensive study on a few small sample with a more > > automated method used on the same sample to compare would be more > > achievable. If the automated method gets similar results, we can use > > that method for larger samples. > > I would certainly be interested in seeing such a result. Can you get us 5000 random article views from the http log made during the first half of 2009? All we need is URL/date/time. Everything else can be blanked for anonymizing. It can be from a 1/10th log or whatever. The list should consist solely of *views*, not edits, and only of articles. All the rest of the data is out there, unless we happen to hit on a deleted/oversighted revision. But using http://dammit.lt/wikistats/ to estimate the hits is less accurate. Many popular pages get popular suddenly, and then quickly fade away. There is most likely a strong correlation to the amount of vandalism that takes place while they are popular to the amount of vandalism that takes place while they are not popular, so I'd much prefer a sample from the actual http log. If we can't get the real thing, I'll start downloading from http://dammit.lt/wikistats/ and generate an estimated one, though. Once we have the list, anyone is free to examine it any way they want, and show their results. But we're talking about probably less than 200 instances of vandalism here, so it'll be quite easy (and fun) to lambaste anyone whose methods produce false positives. If you're going to do it, maybe we should work on a rough-consensus objective definition of "vandalism" before you release the file, though... _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l