Re: [Analytics] Page hourly views

2018-02-12 Thread Bo Han
nd during the weekend an event happenedthat caused the rsync to stop > working. The issue should now be fixed. > > I opened https://phabricator.wikimedia.org/T187073 as attempt to mitigate > the problem and add alarming. > > Luca > > 2018-02-12 7:08 GMT+01:00 Bo Han <b

Re: [Analytics] Page hourly views

2018-02-11 Thread Bo Han
iew files are not being posted, and > haven't been since Feb 9 17:08. > On Feb 11, 2018, at 4:48 PM, Nuria Ruiz <nu...@wikimedia.org> wrote: > >> Sorry, not sure we understand this question. Can you elaborate? >> >> On Sun, Feb 11, 2018 at 12:10 PM, Bo Han &

[Analytics] Page hourly views

2018-02-11 Thread Bo Han
Hello, Is the process for generating pageview hourly backed up? Thank you ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Pageviews dumps behind

2016-11-12 Thread Bo Han
ng datasets as up-to-date as possible, but they might a bit > behind compared to usual frequency. > Thanks for having raised the concern, and for your understanding :) > Joseph > > On Sat, Nov 5, 2016 at 6:42 PM, Bo Han <bo.ning@gmail.com> wrote: > >> Hello, >> >&g

[Analytics] Pageviews dumps behind

2016-11-05 Thread Bo Han
Hello, Is there maintenance going on right now? The pageview dumps seem to be behind: https://dumps.wikimedia.org/other/pageviews/2016/2016-11/ Thanks, Bo ___ Analytics mailing list Analytics@lists.wikimedia.org

Re: [Analytics] Requesting access to Wikimedia Pageview Dumps for Research

2016-03-02 Thread Bo Han
Hi, I noticed the maintenance email was announced at https://lists.wikimedia.org/pipermail/xmldatadumps-l/2016-March/001262.html but it'd be helpful to CC this list as well. Bo On Wed, Mar 2, 2016 at 11:26 AM, Toby Negrin wrote: > I believe the dumps server was

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-01 Thread Bo Han
Thanks for the clarification, Joseph. Bo On Tue, Mar 1, 2016 at 2:02 PM, Joseph Allemandou wrote: > Hi Again, > > @Dan: We will indeed reload data into cassandra. > > @Bo: Actually the two datasets are fairly different. > > The one called pagecounts is slowly getting

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-01 Thread Bo Han
ut shouldn't be impacted by the encoding issue > since page_title is not decoded :) > Files I expect to have changed are the new version of pageviews: > http://dumps.wikimedia.org/other/pageviews/2016/2016-02/ > Joseph > > On Tue, Mar 1, 2016 at 9:52 PM, Bo Han <bo.ning@gmail.com>

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-01 Thread Bo Han
Hi, Would you mind linking the bug fix here? I couldn't find it on phabricator. Thanks, Bo On Tue, Mar 1, 2016 at 7:24 AM, Joseph Allemandou wrote: > Hey Oliver, > It depends on what data you've used: if page_title or other 'encoding > sensitive' data (I can't think

[Analytics] Pagecounts dumps page title UTF-8 escaping

2016-02-03 Thread Bo Han
Hello, I have a question about how page titles are escaped in the pagecounts dumps as found at http://dumps.wikimedia.org/other/pagecounts-all-sites/ and http://dumps.wikimedia.org/other/pagecounts-raw/. I'm wondering for a particular page title, what is the set of escaped page titles in the