Hi Amir,

As Tilman has suggested, your best bet is to query the pageview_hourly
table. I was going to be lazy and give you a query to just find out the
most viewed article for a given country, but then I made a few experiments
and this is the query I came up with to generate a list of countries and
their respective most viewed articles and view counts. It takes a few
minutes to run for a single day, so I'm sure someone here could suggest a
better approach.

WITH articles_countries AS (
>     SELECT country, page_title, sum(view_count) AS views
>     FROM pageview_hourly
>     WHERE year=2018 AND month=3 AND day=15
>     GROUP BY country, page_title
> )
> SELECT s.country as country, s.page_title as page_title, s.views as views
> FROM (
>     SELECT max(named_struct('views', views, 'country', country,
> 'page_title', page_title)) as s from articles_countries group by country
> ) t;


Cheers / see you in ZA,
Fran


On Mon, Jul 9, 2018 at 10:18 AM, Amir E. Aharoni <
amir.ahar...@mail.huji.ac.il> wrote:

> Hi,
>
> Is there a way to find what are the most popular articles per country?
>
> Finding the most popular articles per language is easy with the Pageviews
> tool, but languages and countries are of course not the same.
>
> One thing I tried is going to Turnilo, webrequest_sampled_128, and
> filtering by country. But here it gets troublesome:
> * Splitting can be done by Uri host, which is *more or less* the project,
> or by Uri path, which is *more or less* the article (but see below), and I
> couldn't find a convenient way to combine them.
> * Mobile (.m.) and desktop hosts are separate. It may actually sometimes
> be useful to see differences (or lack thereof) between desktop and mobile,
> but combining them is often useful, too. This can probably be done with
> regular expressions, but this brings us to the biggest problem:
> * Filtering by Uri path would be useful if it didn't have so many paths
> for images, beacons, etc. Filtering using the regular expression
> "\/wiki\/.+" may be the right thing functionally, but in practice it's very
> slow or doesn't work at all.
> * I don't know what exactly is logged in webrequest_sampled_128, but the
> name hints that it doesn't include everything. A sample may be OK for
> countries with a lot of traffic like U.S. or Spain, but for countries with
> smaller traffic this may start being a problem.
>
> Any better ideas?
>
> Thanks!
>
> --
> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> http://aharoni.wordpress.com
> ‪“We're living in pieces,
> I want to live in peace.” – T. Moore‬
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
*Francisco Dans*
Software Engineer, Analytics Team
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to