Halfak added a comment.
Oh! And to the point of reviewing this specific task, please limit your aggregate analysis to 12 months. This will help account for seasonality. E.g., December/January and September look weird and can appear twice in a 17 month sample.TASK
Halfak added a comment.
Make a report! Start from https://meta.wikimedia.org/wiki/Research:New_project :)TASK DETAILhttps://phabricator.wikimedia.org/T189962EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ladsgroup, HalfakCc: Halfak, Aklapper, Ladsgroup,
Ladsgroup added a comment.
This is the plot of geographical mean of revert time in Wikidata:
F23900542: Figure_1.png
This is median of reverts made by users who made more than five reverts in that month
F23900564: Figure_2.png
This is number of users who reverted more than five in the month:
Halfak added a comment.
This is great. Please graph the results, write a report, and give a description of the weirdness in Spanish's dump files.TASK DETAILhttps://phabricator.wikimedia.org/T189962EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ladsgroup,
Halfak added a comment.
We discussed getting the plots cleaned up and adding English/Spanish Wikipedia at our sync meeting.TASK DETAILhttps://phabricator.wikimedia.org/T189962EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ladsgroup, HalfakCc: Halfak,
Ladsgroup added a comment.
This is the new data:
MonthNumberAverage (hour)first quartile (hour)median (hour)last quartile (hour)Geo mean (seconds)
2013-0268212.19914683412337730.0052780.021670.1519132.86284580032776
Ladsgroup added a comment.
Another data:
MonthNumber of users revertingAverage number of reverts per userFirst quartilemedianlast quartile
2013-03119810.4732888146911521.01.04.0
2013-04113319.503971756398941.02.04.0
2013-05115315.3330442324371211.02.04.0
2013-06113412.9611992945326281.01.03.0
Halfak added a comment.
I think we need a plot of this data.
I'd also suggested using the geometric mean for looking at time-to-revert. E.g. geometric_mean = function(x){ exp(mean(log(x))) }
In python, I'd do:
>>> from statistics import mean
>>> from math import log, exp
>>>
>>> def
Ladsgroup added a comment.
This is result of the analysis (note that I omit any revert that took more than 48 hours in average):
MonthNumber of revertsAverage revert time (seconds)Average revert time (hours)
2013-0268217916.9286028441612.199146834123378
Ladsgroup added a comment.
Analysing the dump has been finished and all of the reverted/reverting edits (metadata only) is a 2.7GB file. I need to download it from stat1005 and run some scripts on top of that to make some histograms and data.TASK
10 matches
Mail list logo