Hi Amir, As Tilman has suggested, your best bet is to query the pageview_hourly table. I was going to be lazy and give you a query to just find out the most viewed article for a given country, but then I made a few experiments and this is the query I came up with to generate a list of countries and their respective most viewed articles and view counts. It takes a few minutes to run for a single day, so I'm sure someone here could suggest a better approach.
WITH articles_countries AS ( > SELECT country, page_title, sum(view_count) AS views > FROM pageview_hourly > WHERE year=2018 AND month=3 AND day=15 > GROUP BY country, page_title > ) > SELECT s.country as country, s.page_title as page_title, s.views as views > FROM ( > SELECT max(named_struct('views', views, 'country', country, > 'page_title', page_title)) as s from articles_countries group by country > ) t; Cheers / see you in ZA, Fran On Mon, Jul 9, 2018 at 10:18 AM, Amir E. Aharoni < amir.ahar...@mail.huji.ac.il> wrote: > Hi, > > Is there a way to find what are the most popular articles per country? > > Finding the most popular articles per language is easy with the Pageviews > tool, but languages and countries are of course not the same. > > One thing I tried is going to Turnilo, webrequest_sampled_128, and > filtering by country. But here it gets troublesome: > * Splitting can be done by Uri host, which is *more or less* the project, > or by Uri path, which is *more or less* the article (but see below), and I > couldn't find a convenient way to combine them. > * Mobile (.m.) and desktop hosts are separate. It may actually sometimes > be useful to see differences (or lack thereof) between desktop and mobile, > but combining them is often useful, too. This can probably be done with > regular expressions, but this brings us to the biggest problem: > * Filtering by Uri path would be useful if it didn't have so many paths > for images, beacons, etc. Filtering using the regular expression > "\/wiki\/.+" may be the right thing functionally, but in practice it's very > slow or doesn't work at all. > * I don't know what exactly is logged in webrequest_sampled_128, but the > name hints that it doesn't include everything. A sample may be OK for > countries with a lot of traffic like U.S. or Spain, but for countries with > smaller traffic this may start being a problem. > > Any better ideas? > > Thanks! > > -- > Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי > http://aharoni.wordpress.com > “We're living in pieces, > I want to live in peace.” – T. Moore > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- *Francisco Dans* Software Engineer, Analytics Team Wikimedia Foundation
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics