Thank you so much! In many countries it's A couple of questions: 1. Are any of the results of this query private? Or can I talk about them to people? 2. Is anything like this already published anywhere? If it isn't, it may be nice to publish such a thing, similarly to Google Zeitgeist.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore 2018-07-09 13:19 GMT+03:00 Francisco Dans <fd...@wikimedia.org>: > Hi Amir, > > As Tilman has suggested, your best bet is to query the pageview_hourly > table. I was going to be lazy and give you a query to just find out the > most viewed article for a given country, but then I made a few experiments > and this is the query I came up with to generate a list of countries and > their respective most viewed articles and view counts. It takes a few > minutes to run for a single day, so I'm sure someone here could suggest a > better approach. > > WITH articles_countries AS ( >> SELECT country, page_title, sum(view_count) AS views >> FROM pageview_hourly >> WHERE year=2018 AND month=3 AND day=15 >> GROUP BY country, page_title >> ) >> SELECT s.country as country, s.page_title as page_title, s.views as views >> FROM ( >> SELECT max(named_struct('views', views, 'country', country, >> 'page_title', page_title)) as s from articles_countries group by country >> ) t; > > > Cheers / see you in ZA, > Fran > > > On Mon, Jul 9, 2018 at 10:18 AM, Amir E. Aharoni < > amir.ahar...@mail.huji.ac.il> wrote: > >> Hi, >> >> Is there a way to find what are the most popular articles per country? >> >> Finding the most popular articles per language is easy with the Pageviews >> tool, but languages and countries are of course not the same. >> >> One thing I tried is going to Turnilo, webrequest_sampled_128, and >> filtering by country. But here it gets troublesome: >> * Splitting can be done by Uri host, which is *more or less* the project, >> or by Uri path, which is *more or less* the article (but see below), and I >> couldn't find a convenient way to combine them. >> * Mobile (.m.) and desktop hosts are separate. It may actually sometimes >> be useful to see differences (or lack thereof) between desktop and mobile, >> but combining them is often useful, too. This can probably be done with >> regular expressions, but this brings us to the biggest problem: >> * Filtering by Uri path would be useful if it didn't have so many paths >> for images, beacons, etc. Filtering using the regular expression >> "\/wiki\/.+" may be the right thing functionally, but in practice it's very >> slow or doesn't work at all. >> * I don't know what exactly is logged in webrequest_sampled_128, but the >> name hints that it doesn't include everything. A sample may be OK for >> countries with a lot of traffic like U.S. or Spain, but for countries with >> smaller traffic this may start being a problem. >> >> Any better ideas? >> >> Thanks! >> >> -- >> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי >> http://aharoni.wordpress.com >> “We're living in pieces, >> I want to live in peace.” – T. Moore >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > > -- > *Francisco Dans* > Software Engineer, Analytics Team > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics