Data has been updated this morning (CEST). On Thu, Oct 11, 2018 at 5:13 AM Nuria Ruiz <nu...@wikimedia.org> wrote:
> >Wikistats 1 generates data on content pages with a delay of 10-15 days > after the end of the month > This is true for full snapshots (for the reasons we have discussed before > and that Dan has described on this thread). You can expect data to be > available on the API soon after the 10th, but it is unlikely that it will > be there before the 10th as we do not start the process until the 5th. > > Now, data - as you now- is streamed real time, every second. So it is only > the full reconstruction of events, the full snapshot, that takes several > days to build. Have you looked into using the real time events when the > next month snapshot is yet not available? > > > On Wed, Oct 10, 2018 at 7:48 PM Dan Andreescu <dandree...@wikimedia.org> > wrote: > >> It should be updated soon, the jobs are all done successfully. But >> currently we do expect this kind of lag, I'll explain why. >> >> When we started we were sqooping at the beginning of the month and the >> processing takes something like 4 days total, most of it sqooping. But >> this put too much load on the database serves too close to the beginning of >> the month when a bunch of other stuff is running. So we had to move it >> back to the 5th of the month [1]. Add 4 days onto that and we end up >> finishing around the 9th of the month. We don't like this at all and we're >> trying to figure out a better way to import the data incrementally so we >> can just start processing when we have all of it. It's unfortunate but we >> couldn't foresee the infrastructure limitation, too much was up in the air >> about even where we would sqoop from when we started this work. Joseph and >> I have a weekly meeting to discuss moving towards a more incremental >> approach, and this task is the parent task to watch for now: >> https://phabricator.wikimedia.org/T193650 (priority is low because we >> have too many other commitments, but it's something I'd love to see before >> we call wikistats 2 "production" quality) >> >> [1] >> https://github.com/wikimedia/puppet/blob/28b78985d3612a6e19720be1fe8eef5f0dfc2ed7/modules/profile/manifests/analytics/refinery/job/sqoop_mediawiki.pp#L43 >> >> On Wed, Oct 10, 2018 at 10:00 PM Neil Patel Quinn <nqu...@wikimedia.org> >> wrote: >> >>> Hey there! >>> >>> I just wrote a script that fetches data from the AQS new pages endpoint >>> <https://wikimedia.org/api/rest_v1/#!/Edited_pages_data/get_metrics_edited_pages_new_project_editor_type_page_type_granularity_start_end> >>> in order to prepare the our monthly health metrics (T199459 >>> <https://phabricator.wikimedia.org/T199459>). >>> >>> However, it seems like that endpoint doesn't yet have monthly data for >>> September. For example, a query for Commons with a start of July 1 and >>> and an end of October 1 >>> <https://wikimedia.org/api/rest_v1/metrics/edited-pages/new/commons.wikimedia.org/all-editor-types/content/monthly/20180701/20181001> >>> returns only data for July and August. What's the schedule for updating >>> this data? >>> >>> To be honest, I feel pretty frustrated by this. Wikistats 1 generates >>> data on content pages with a delay of 10-15 days after the end of the >>> month, which has made it difficult for us to provide timely metrics to >>> executives and the board. I had assumed (to a degree that I didn't even >>> check) that by switching to this API, we would instead only have to deal >>> with the delay in generating the mediawiki_history snapshot (5-7 days after >>> the end of the month). But that doesn't seem to be the case. >>> -- >>> Neil Patel Quinn >>> <https://meta.wikimedia.org/wiki/User:Neil_P._Quinn-WMF> (he/him/his) >>> product analyst, Wikimedia Foundation >>> _______________________________________________ >>> Analytics mailing list >>> Analytics@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics