Hi Oliver,

On Thu, Mar 12, 2015 at 07:44:14PM -0400, Oliver Keyes wrote:
> On 12 March 2015 at 19:41, Erik Zachte <ezac...@wikimedia.org> wrote:
> >>>> Well, again; the wikistats data that Erik refers to doesn't have any
> >>>> granularity within the period this dataset covers.
> >
> > So I just uploaded 
> > https://commons.wikimedia.org/wiki/File:PageViewsWikipedia2015.png
> > which shows daily page views as collected by webstatscollector since 2008 
> > and published in hourly projectcounts files in
> > https://dumps.wikimedia.org/other/pagecounts-raw/
> > and aggregated by Wikistats per project (by week, month, day of week) and 
> > published in e.g.
> > http://stats.wikimedia.org/EN/TablesPageViewsMonthlyOriginalCombined.htm
> > (Wikipedia only, but webstatscollector doesn't report on any huge PV 
> > increase for other projects)
> >
> > My initial comment in this thread (again) is that you defined a 'legacy' 
> > definition yourself, and built a script to implement your legacy definition.
> 
> Actually, no; the UDF Is a replica of the Hive implementation of your
> definition, which Christian wrote.

I am with Erik when he refutes it being “his” definition.

It is webstatscollector's definition, which originates (as far as git
logs tell) from Domas in 2008 [1], and has seen some updates since
from other people like Hampton and Diederik.
I think all of them did great work.

Almost 7 years after its implementation, it still is the yardstick at
wmf to measure page views by. That's a great achievement. Kudos!

Erik's wonderful reports /use/ data that is based on those definitions.
And Christian only ported the webstatscollector C-implementation to
Hive.

---------------------

Despite the efforts to update the webstatscollector pageview
definition, I heard that technical limitations seem to have gotten in
the way back then, and effectively MediaWiki, the WMF-hosted wikis and
the shape of the corresponding request-stream changed more often and
more heavily than the webstatscollector's definition saw updates.
Hence, now that technical limitations are gone, there is need to
overhaul the pageview definition.

From my point of view, the numbers computed by the webstatscollector
pageview definition and those computed by the overhauled pageview
definition need not agree.

But with the webstatscollector pageview definition being the yardstick
... having an understanding within the organization where/why/how
those numbers differ would not hurt.

YMMV.



> Unfortunately I've been moved from R&D, and don't have the time to
> answer endless "just one more thing..." questions.

I have to admit that if you're not interested in doing QA, then the
thread's subject of “final pageviews QA” mislead me.
I adjusted accordingly.



Have fun,
Christian



[1] 
https://git.wikimedia.org/commit/analytics%2Fwebstatscollector.git/7617da88b9fa36dcf3bce593ad9a6da3bf9ec325



-- 
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
                           Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3     Email:  christ...@quelltextlich.at
4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
                             Fax:            +43 7946 / 20 5 81
                             Homepage: http://quelltextlich.at/
---------------------------------------------------------------

Attachment: signature.asc
Description: Digital signature

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to