Data has been updated this morning (CEST).

On Thu, Oct 11, 2018 at 5:13 AM Nuria Ruiz <nu...@wikimedia.org> wrote:

> >Wikistats 1 generates data on content pages with a delay of 10-15 days
> after the end of the month
> This is true for full snapshots (for the reasons we have discussed before
> and that Dan has described on this thread). You can expect data to be
> available on the API soon after the 10th, but it is unlikely that it will
> be there before the 10th as we do not start the process until the 5th.
>
> Now, data - as you now- is streamed real time, every second. So it is only
> the full reconstruction of events, the full snapshot, that takes several
> days to build. Have you looked into using the real time events when the
> next month snapshot is yet not available?
>
>
> On Wed, Oct 10, 2018 at 7:48 PM Dan Andreescu <dandree...@wikimedia.org>
> wrote:
>
>> It should be updated soon, the jobs are all done successfully.  But
>> currently we do expect this kind of lag, I'll explain why.
>>
>> When we started we were sqooping at the beginning of the month and the
>> processing takes something like 4 days total, most of it sqooping.  But
>> this put too much load on the database serves too close to the beginning of
>> the month when a bunch of other stuff is running.  So we had to move it
>> back to the 5th of the month [1].  Add 4 days onto that and we end up
>> finishing around the 9th of the month.  We don't like this at all and we're
>> trying to figure out a better way to import the data incrementally so we
>> can just start processing when we have all of it.  It's unfortunate but we
>> couldn't foresee the infrastructure limitation, too much was up in the air
>> about even where we would sqoop from when we started this work.  Joseph and
>> I have a weekly meeting to discuss moving towards a more incremental
>> approach, and this task is the parent task to watch for now:
>> https://phabricator.wikimedia.org/T193650 (priority is low because we
>> have too many other commitments, but it's something I'd love to see before
>> we call wikistats 2 "production" quality)
>>
>> [1]
>> https://github.com/wikimedia/puppet/blob/28b78985d3612a6e19720be1fe8eef5f0dfc2ed7/modules/profile/manifests/analytics/refinery/job/sqoop_mediawiki.pp#L43
>>
>> On Wed, Oct 10, 2018 at 10:00 PM Neil Patel Quinn <nqu...@wikimedia.org>
>> wrote:
>>
>>> Hey there!
>>>
>>> I just wrote a script that fetches data from the AQS new pages endpoint
>>> <https://wikimedia.org/api/rest_v1/#!/Edited_pages_data/get_metrics_edited_pages_new_project_editor_type_page_type_granularity_start_end>
>>> in order to prepare the our monthly health metrics (T199459
>>> <https://phabricator.wikimedia.org/T199459>).
>>>
>>> However, it seems like that endpoint doesn't yet have monthly data for
>>> September. For example, a query for Commons with a start of July 1 and
>>> and an end of October 1
>>> <https://wikimedia.org/api/rest_v1/metrics/edited-pages/new/commons.wikimedia.org/all-editor-types/content/monthly/20180701/20181001>
>>> returns only data for July and August. What's the schedule for updating
>>> this data?
>>>
>>> To be honest, I feel pretty frustrated by this. Wikistats 1 generates
>>> data on content pages with a delay of 10-15 days after the end of the
>>> month, which has made it difficult for us to provide timely metrics to
>>> executives and the board. I had assumed (to a degree that I didn't even
>>> check) that by switching to this API, we would instead only have to deal
>>> with the delay in generating the mediawiki_history snapshot (5-7 days after
>>> the end of the month). But that doesn't seem to be the case.
>>> --
>>> Neil Patel Quinn
>>> <https://meta.wikimedia.org/wiki/User:Neil_P._Quinn-WMF> (he/him/his)
>>> product analyst, Wikimedia Foundation
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>


-- 
*Joseph Allemandou*
Data Engineer @ Wikimedia Foundation
IRC: joal
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to