Amir: FYI that this data has couple caveats:
1) the "-" is pageviews for a page for which we cannot extract a title. 2) data very much affected by bot spikes (you can mitigate that by filtering by agent_type="user" but still, a significant portion of bot traffic is not label as such). https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly#Changes_and_known_problems_since_2015-06-16 3) there are privacy considerations when number of views are small: https://wikitech.wikimedia.org/wiki/Analytics/AQS/ Pageviews/Pageviews_by_country#Is_Pageviews_by_country_privacy_sensitive >Is anything like this already published anywhere? If it isn't, it may be nice to publish such a thing, similarly to Google Zeitgeist. We do not have immediate plans to do so due to privacy considerations. Now, Dario's team has a project on this regard that might render datasets to be published this year: https://meta.wikimedia.org/ wiki/Research:Quantifying_the_global_attention_to_public_ health_threats_through_Wikipedia_pageview_data See also: https://phabricator.wikimedia.org/T189339 Thanks, Nuria On Mon, Jul 9, 2018 at 5:41 AM, Amir E. Aharoni < amir.ahar...@mail.huji.ac.il> wrote: > Thanks. Another question: For some countries, the result is "-", for > example Germany: > > Germany - en.wikipedia 1275634 > > Any idea why? > > (I modified the query a bit and added the "project" column. And yes, the > fact that en.wikipedia is at the top in Germany is also quite odd.) > > > -- > Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי > http://aharoni.wordpress.com > “We're living in pieces, > I want to live in peace.” – T. Moore > > 2018-07-09 15:17 GMT+03:00 Francisco Dans <fd...@wikimedia.org>: > >> I think as long as you put in a filter so that the minimum pageviews is >> maybe 1000, you should be fine privacy wise. I can't speak too much to your >> second question. >> >> On Mon, Jul 9, 2018 at 1:59 PM, Amir E. Aharoni < >> amir.ahar...@mail.huji.ac.il> wrote: >> >>> Thank you so much! In many countries it's >>> >>> A couple of questions: >>> 1. Are any of the results of this query private? Or can I talk about >>> them to people? >>> 2. Is anything like this already published anywhere? If it isn't, it may >>> be nice to publish such a thing, similarly to Google Zeitgeist. >>> >>> >>> -- >>> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי >>> http://aharoni.wordpress.com >>> “We're living in pieces, >>> I want to live in peace.” – T. Moore >>> >>> 2018-07-09 13:19 GMT+03:00 Francisco Dans <fd...@wikimedia.org>: >>> >>>> Hi Amir, >>>> >>>> As Tilman has suggested, your best bet is to query the pageview_hourly >>>> table. I was going to be lazy and give you a query to just find out the >>>> most viewed article for a given country, but then I made a few experiments >>>> and this is the query I came up with to generate a list of countries and >>>> their respective most viewed articles and view counts. It takes a few >>>> minutes to run for a single day, so I'm sure someone here could suggest a >>>> better approach. >>>> >>>> WITH articles_countries AS ( >>>>> SELECT country, page_title, sum(view_count) AS views >>>>> FROM pageview_hourly >>>>> WHERE year=2018 AND month=3 AND day=15 >>>>> GROUP BY country, page_title >>>>> ) >>>>> SELECT s.country as country, s.page_title as page_title, s.views as >>>>> views >>>>> FROM ( >>>>> SELECT max(named_struct('views', views, 'country', country, >>>>> 'page_title', page_title)) as s from articles_countries group by country >>>>> ) t; >>>> >>>> >>>> Cheers / see you in ZA, >>>> Fran >>>> >>>> >>>> On Mon, Jul 9, 2018 at 10:18 AM, Amir E. Aharoni < >>>> amir.ahar...@mail.huji.ac.il> wrote: >>>> >>>>> Hi, >>>>> >>>>> Is there a way to find what are the most popular articles per country? >>>>> >>>>> Finding the most popular articles per language is easy with the >>>>> Pageviews tool, but languages and countries are of course not the same. >>>>> >>>>> One thing I tried is going to Turnilo, webrequest_sampled_128, and >>>>> filtering by country. But here it gets troublesome: >>>>> * Splitting can be done by Uri host, which is *more or less* the >>>>> project, or by Uri path, which is *more or less* the article (but see >>>>> below), and I couldn't find a convenient way to combine them. >>>>> * Mobile (.m.) and desktop hosts are separate. It may actually >>>>> sometimes be useful to see differences (or lack thereof) between desktop >>>>> and mobile, but combining them is often useful, too. This can probably be >>>>> done with regular expressions, but this brings us to the biggest problem: >>>>> * Filtering by Uri path would be useful if it didn't have so many >>>>> paths for images, beacons, etc. Filtering using the regular expression >>>>> "\/wiki\/.+" may be the right thing functionally, but in practice it's >>>>> very >>>>> slow or doesn't work at all. >>>>> * I don't know what exactly is logged in webrequest_sampled_128, but >>>>> the name hints that it doesn't include everything. A sample may be OK for >>>>> countries with a lot of traffic like U.S. or Spain, but for countries with >>>>> smaller traffic this may start being a problem. >>>>> >>>>> Any better ideas? >>>>> >>>>> Thanks! >>>>> >>>>> -- >>>>> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי >>>>> http://aharoni.wordpress.com >>>>> “We're living in pieces, >>>>> I want to live in peace.” – T. Moore >>>>> >>>>> _______________________________________________ >>>>> Analytics mailing list >>>>> Analytics@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>> >>>> >>>> >>>> -- >>>> *Francisco Dans* >>>> Software Engineer, Analytics Team >>>> Wikimedia Foundation >>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> Analytics@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>>> >>> >>> _______________________________________________ >>> Analytics mailing list >>> Analytics@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >> >> >> -- >> *Francisco Dans* >> Software Engineer, Analytics Team >> Wikimedia Foundation >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics