really good summary of the situation, Neil, I'm bookmarking this and will
re-use it when people ask :)

On Thu, Mar 22, 2018 at 7:07 AM, Neil Patel Quinn <nqu...@wikimedia.org>
wrote:

> On 22 March 2018 at 13:41, Neil Patel Quinn <nqu...@wikimedia.org> wrote:
>
>>
>> Both the edit data and pageview data that you're talking about come from
>> the Hadoop-based Analytics Data Lake
>> <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake>. However,
>> because of limitations in the underlying MediaWiki application databases
>> <https://www.mediawiki.org/wiki/Manual:Database_layout> *that Hive pulls
>> edit data from*, the data requires some complex reconstruction and
>> denormalization
>> <https://wikitech.wikimedia.org/wiki/Analytics/Systems/Data_Lake/Edits/Pipeline>
>> that takes several days to a week.
>>
>>
> Sorry, I garbled that a little. It's more correct to say: "because of
> limitations in the underlying MediaWiki application databases *that are
> the source of the edit data*, the data requires..."
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to