Re: [Analytics] Echo schema eventlogging

2016-03-01 Thread Roan Kattouw
[Reviving old thread] I was looking at our EventLogging data today, and discovered that Schema:Edit contains no useful information that isn't already in the database apart from which button people use to thank each other, and if we really care about that we can measure it separately without

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-01 Thread Tilman Bayer
Thanks Joseph! Is it reasonable to assume that the aggregate data in projectview_hourly has not been affected? On Tue, Mar 1, 2016 at 7:24 AM, Joseph Allemandou wrote: > Hey Oliver, > It depends

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-01 Thread Bo Han
Thanks for the clarification, Joseph. Bo On Tue, Mar 1, 2016 at 2:02 PM, Joseph Allemandou wrote: > Hi Again, > > @Dan: We will indeed reload data into cassandra. > > @Bo: Actually the two datasets are fairly different. > > The one called pagecounts is slowly getting

Re: [Analytics] Please provide feedback on suggested improvements to the Code of Conduct

2016-03-01 Thread Matthew Flaschen
See discussion at https://www.mediawiki.org/wiki/Talk:Code_of_Conduct/Draft#Seems_to_be_a_lot_of_unanswered_questions_and_gray_areas . Matt On 02/23/2016 09:37 PM, regu...@gmail.com wrote: I notice you mention in a lot of places that people should contact an administrator. What if the person

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-01 Thread Joseph Allemandou
Hi Again, @Dan: We will indeed reload data into cassandra. @Bo: Actually the two datasets are fairly different. The one called pagecounts is slowly getting deprecated toward the one called pageview, defined by Research people at WMF: https://meta.wikimedia.org/wiki/Research:Page_view The

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-01 Thread Bo Han
Thanks Joseph. Am I correct in saying that the counts in pageviews are just the aggregated counts for decoded page titles from pagecounts-all-sites? Bo On Tue, Mar 1, 2016 at 1:39 PM, Joseph Allemandou wrote: > Hi ! > pagecounts are regenerated but shouldn't be

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-01 Thread Joseph Allemandou
Hi ! pagecounts are regenerated but shouldn't be impacted by the encoding issue since page_title is not decoded :) Files I expect to have changed are the new version of pageviews: http://dumps.wikimedia.org/other/pageviews/2016/2016-02/ Joseph On Tue, Mar 1, 2016 at 9:52 PM, Bo Han

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-01 Thread Bo Han
Hi, Would you mind linking the bug fix here? I couldn't find it on phabricator. Thanks, Bo On Tue, Mar 1, 2016 at 7:24 AM, Joseph Allemandou wrote: > Hey Oliver, > It depends on what data you've used: if page_title or other 'encoding > sensitive' data (I can't think

Re: [Analytics] [Ops] Dark traffic

2016-03-01 Thread Dario Taraborelli
hey Andrew, we're monitoring the impact of this change (which we rolled out on 2/22) with a number of external partners (BBC, Le Monde, JSTOR, Elsevier) and we're planning to write a full report in April. Elsevier reported that in June visible inbound traffic from Wikipedia dropped by 99% in June

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-01 Thread Joseph Allemandou
Hey Oliver, It depends on what data you've used: if page_title or other 'encoding sensitive' data (I can't think of any other, but ...) is part of it, then yes, you should ! On Tue, Mar 1, 2016 at 3:27 PM, Oliver Keyes wrote: > Hey Joseph, > > Thanks for letting us know.

Re: [Analytics] [Ops] Dark traffic

2016-03-01 Thread Andrew Lih
Thanks James, Dan, Chris and all for the quick answer. Nice to see this change. As Alex Stinson pointed out in the Phabricator discussion, it helps with our GLAM partners so they can keep tracking how much referral traffic comes from WM projects. -Andrew On Tue, Mar 1, 2016 at 10:02 AM, Chris

Re: [Analytics] Dark traffic

2016-03-01 Thread Federico Leva (Nemo)
James Forrester, 01/03/2016 15:59: to be more of a "good citizen" of the Internet ...people should make their websites HTTPS. Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Dark traffic

2016-03-01 Thread James Forrester
On Tuesday, 1 March 2016, Andrew Lih wrote: > Hi folks, I got this note from an external organization that wanted to > know more about what Wikimedia changed so that they are now accurate > getting referral info. Any pointers? > > "Wikipedia was implementing a fix so it

[Analytics] Dark traffic

2016-03-01 Thread Andrew Lih
Hi folks, I got this note from an external organization that wanted to know more about what Wikimedia changed so that they are now accurate getting referral info. Any pointers? "Wikipedia was implementing a fix so it would not be “dark traffic" in the analytics reports. This has been happening

[Analytics] Hadoop - Last week data needs to be backfilled

2016-03-01 Thread Joseph Allemandou
Hi, *TL,DR: Please don't use hive / spark / hadoop before next week.* Last week the Analytics Team performed an upgrade to the Hadoop Cluster. It went reasonably well except for many of the hadoop processes were launched with a special option to NOT use utf-8 as default encoding. This issue