really good summary of the situation, Neil, I'm bookmarking this and will re-use it when people ask :)
On Thu, Mar 22, 2018 at 7:07 AM, Neil Patel Quinn <nqu...@wikimedia.org> wrote: > On 22 March 2018 at 13:41, Neil Patel Quinn <nqu...@wikimedia.org> wrote: > >> >> Both the edit data and pageview data that you're talking about come from >> the Hadoop-based Analytics Data Lake >> <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake>. However, >> because of limitations in the underlying MediaWiki application databases >> <https://www.mediawiki.org/wiki/Manual:Database_layout> *that Hive pulls >> edit data from*, the data requires some complex reconstruction and >> denormalization >> <https://wikitech.wikimedia.org/wiki/Analytics/Systems/Data_Lake/Edits/Pipeline> >> that takes several days to a week. >> >> > Sorry, I garbled that a little. It's more correct to say: "because of > limitations in the underlying MediaWiki application databases *that are > the source of the edit data*, the data requires..." > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics