Amir:

FYI that this data has couple caveats:

1) the "-" is pageviews for  a page for which we cannot extract a title.

2) data very much affected by bot spikes (you can mitigate that by
filtering by agent_type="user" but still, a significant portion of bot
traffic is not label as such).
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly#Changes_and_known_problems_since_2015-06-16

3) there are privacy considerations when number of views are small:
https://wikitech.wikimedia.org/wiki/Analytics/AQS/
Pageviews/Pageviews_by_country#Is_Pageviews_by_country_privacy_sensitive


>Is anything like this already published anywhere? If it isn't, it may be
nice to publish such a thing, similarly to Google Zeitgeist.
We do not have immediate plans to do so due to privacy considerations. Now,
Dario's team has a project on this regard that might render datasets to be
published this year: https://meta.wikimedia.org/
wiki/Research:Quantifying_the_global_attention_to_public_
health_threats_through_Wikipedia_pageview_data

See also:
https://phabricator.wikimedia.org/T189339

Thanks,

Nuria

On Mon, Jul 9, 2018 at 5:41 AM, Amir E. Aharoni <
amir.ahar...@mail.huji.ac.il> wrote:

> Thanks. Another question: For some countries, the result is "-", for
> example Germany:
>
> Germany    -    en.wikipedia    1275634
>
> Any idea why?
>
> (I modified the query a bit and added the "project" column. And yes, the
> fact that en.wikipedia is at the top in Germany is also quite odd.)
>
>
> --
> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> http://aharoni.wordpress.com
> ‪“We're living in pieces,
> I want to live in peace.” – T. Moore‬
>
> 2018-07-09 15:17 GMT+03:00 Francisco Dans <fd...@wikimedia.org>:
>
>> I think as long as you put in a filter so that the minimum pageviews is
>> maybe 1000, you should be fine privacy wise. I can't speak too much to your
>> second question.
>>
>> On Mon, Jul 9, 2018 at 1:59 PM, Amir E. Aharoni <
>> amir.ahar...@mail.huji.ac.il> wrote:
>>
>>> Thank you so much! In many countries it's
>>>
>>> A couple of questions:
>>> 1. Are any of the results of this query private? Or can I talk about
>>> them to people?
>>> 2. Is anything like this already published anywhere? If it isn't, it may
>>> be nice to publish such a thing, similarly to Google Zeitgeist.
>>>
>>>
>>> --
>>> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
>>> http://aharoni.wordpress.com
>>> ‪“We're living in pieces,
>>> I want to live in peace.” – T. Moore‬
>>>
>>> 2018-07-09 13:19 GMT+03:00 Francisco Dans <fd...@wikimedia.org>:
>>>
>>>> Hi Amir,
>>>>
>>>> As Tilman has suggested, your best bet is to query the pageview_hourly
>>>> table. I was going to be lazy and give you a query to just find out the
>>>> most viewed article for a given country, but then I made a few experiments
>>>> and this is the query I came up with to generate a list of countries and
>>>> their respective most viewed articles and view counts. It takes a few
>>>> minutes to run for a single day, so I'm sure someone here could suggest a
>>>> better approach.
>>>>
>>>> WITH articles_countries AS (
>>>>>     SELECT country, page_title, sum(view_count) AS views
>>>>>     FROM pageview_hourly
>>>>>     WHERE year=2018 AND month=3 AND day=15
>>>>>     GROUP BY country, page_title
>>>>> )
>>>>> SELECT s.country as country, s.page_title as page_title, s.views as
>>>>> views
>>>>> FROM (
>>>>>     SELECT max(named_struct('views', views, 'country', country,
>>>>> 'page_title', page_title)) as s from articles_countries group by country
>>>>> ) t;
>>>>
>>>>
>>>> Cheers / see you in ZA,
>>>> Fran
>>>>
>>>>
>>>> On Mon, Jul 9, 2018 at 10:18 AM, Amir E. Aharoni <
>>>> amir.ahar...@mail.huji.ac.il> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Is there a way to find what are the most popular articles per country?
>>>>>
>>>>> Finding the most popular articles per language is easy with the
>>>>> Pageviews tool, but languages and countries are of course not the same.
>>>>>
>>>>> One thing I tried is going to Turnilo, webrequest_sampled_128, and
>>>>> filtering by country. But here it gets troublesome:
>>>>> * Splitting can be done by Uri host, which is *more or less* the
>>>>> project, or by Uri path, which is *more or less* the article (but see
>>>>> below), and I couldn't find a convenient way to combine them.
>>>>> * Mobile (.m.) and desktop hosts are separate. It may actually
>>>>> sometimes be useful to see differences (or lack thereof) between desktop
>>>>> and mobile, but combining them is often useful, too. This can probably be
>>>>> done with regular expressions, but this brings us to the biggest problem:
>>>>> * Filtering by Uri path would be useful if it didn't have so many
>>>>> paths for images, beacons, etc. Filtering using the regular expression
>>>>> "\/wiki\/.+" may be the right thing functionally, but in practice it's 
>>>>> very
>>>>> slow or doesn't work at all.
>>>>> * I don't know what exactly is logged in webrequest_sampled_128, but
>>>>> the name hints that it doesn't include everything. A sample may be OK for
>>>>> countries with a lot of traffic like U.S. or Spain, but for countries with
>>>>> smaller traffic this may start being a problem.
>>>>>
>>>>> Any better ideas?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> --
>>>>> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
>>>>> http://aharoni.wordpress.com
>>>>> ‪“We're living in pieces,
>>>>> I want to live in peace.” – T. Moore‬
>>>>>
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> Analytics@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Francisco Dans*
>>>> Software Engineer, Analytics Team
>>>> Wikimedia Foundation
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>>
>> --
>> *Francisco Dans*
>> Software Engineer, Analytics Team
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to