Any idea why the most popular article in India is "-"? CCing Dan Garry of
Discovery team.

On Fri, Jan 22, 2016 at 5:13 PM, Tilman Bayer <tba...@wikimedia.org> wrote:

> Below is an example Hive query yielding the 50 most viewed pages in
> India during December 2015. It took less than 10 minutes of wall clock
> time to complete.
>
> SELECT CONCAT('https://',project,'.org/wiki/',page_title),
> SUM(view_count) AS views
> FROM wmf.pageview_hourly
> WHERE
>    year = 2015
>    AND month = 12
>    AND country = "India"
>    AND agent_type = "user"
> GROUP BY project, page_title
> ORDER BY views DESC LIMIT 50;
>
> ...
> Total MapReduce CPU Time Spent: 0 days 19 hours 13 minutes 2 seconds 930
> msec
> OK
> _c0 views
> https://en.wikipedia.org/wiki/Main_Page 43515253
> https://en.wikipedia.org/wiki/Special:Search 4818687
> https://en.wikipedia.org/wiki/- 2650346
> https://en.wikipedia.org/wiki/Bajirao_I 1414810
> https://en.wikipedia.org/wiki/Dilwale_(2015_film) 1410015
> https://en.wikipedia.org/wiki/Mastani 1232964
> https://en.wikipedia.org/wiki/Bajirao_Mastani_(film) 1133261
> https://en.wikipedia.org/wiki/List_of_Bollywood_films_of_2015 632890
> https://en.wikipedia.org/wiki/Hate_Story_3 582816
> https://en.wikipedia.org/wiki/Special:MobileMenu 499379
> https://en.wikipedia.org/wiki/Star_Wars:_The_Force_Awakens 438113
> https://en.wikipedia.org/wiki/Tamasha_(film) 390519
> https://en.wikipedia.org/wiki/Prem_Ratan_Dhan_Payo 378133
> https://en.wikipedia.org/wiki/India 368946
> https://en.wikipedia.org/wiki/List_of_Bollywood_films_of_2016 335547
> https://en.wikipedia.org/wiki/Star_Wars 334326
> https://en.wikipedia.org/wiki/Sunny_Leone 333848
> https://en.wikipedia.org/wiki/Sundar_Pichai 329264
> https://en.wikipedia.org/wiki/Special:Book 324255
> https://en.wikipedia.org/wiki/List_of_highest-grossing_Bollywood_films
> 321418
> https://en.wikipedia.org/wiki/Salman_Khan 309113
> https://en.wikipedia.org/wiki/'Tis_the_Season 308221
> https://en.wikipedia.org/wiki/Mandana_Karimi 289662
> https://en.wikipedia.org/wiki/Kyaa_Kool_Hain_Hum_3 281801
> https://en.wikipedia.org/wiki/Kashibai 272673
> https://en.wikipedia.org/wiki/Bigg_Boss_9 272203
> https://en.wikipedia.org/wiki/Kriti_Sanon 266773
> https://en.wikipedia.org/wiki/2012_Delhi_gang_rape 265296
> https://en.wikipedia.org/wiki/Shah_Rukh_Khan 263729
> https://en.wikipedia.org/wiki/Neerja_Bhanot 259410
> https://en.wikipedia.org/wiki/Nora_Fatehi 252085
> https://en.wikipedia.org/wiki/Ashoka 250255
> https://en.wikipedia.org/wiki/B._K._S._Iyengar 248422
> https://en.wikipedia.org/wiki/2015_South_Indian_floods 246377
> https://en.wikipedia.org/wiki/Baahubali:_The_Beginning 244281
> https://en.wikipedia.org/wiki/Shamsher_Bahadur_I_(Krishna_Rao) 232122
> https://en.wikipedia.org/wiki/Christmas 228278
> https://en.wikipedia.org/wiki/Thanga_Magan_(2015_film) 222373
> https://en.wikipedia.org/wiki/Ranveer_Singh 221010
> https://en.wikipedia.org/wiki/A._P._J._Abdul_Kalam 220612
> https://en.wikipedia.org/wiki/Shivaji 218245
> https://en.wikipedia.org/wiki/Deepika_Padukone 218242
> https://en.wikipedia.org/wiki/TLC:_Tables,_Ladders_and_Chairs_(2015)
> 211920
> https://en.wikipedia.org/wiki/Gizele_Thakral 206585
> https://en.wikipedia.org/wiki/Urvashi_Rautela 204305
> https://en.wikipedia.org/wiki/Peshwa 194957
> https://en.wikipedia.org/wiki/Kajol 192044
> https://hi.wikipedia.org/wiki/मुखपृष्ठ 184274
> https://en.wikipedia.org/wiki/Quantico_(TV_series) 183112
> https://en.wikipedia.org/wiki/Mahatma_Gandhi 182336
> Time taken: 562.621 seconds, Fetched: 50 row(s)
>
>
> See also the discussion at https://phabricator.wikimedia.org/T120113
> (As mentioned there, a while ago I retrieved the global top 200 pages
> for a timespan of almost six months, with some wait time but no major
> issues. It's not quite clear to me why the "brute force" approach
> mentioned in the ticket failed, but I guess it had to do with the
> difficulty of repeating such a query for all projects - or countries -
> to generate top lists for every one of them.)
>
> On Wed, Jan 20, 2016 at 12:42 PM, Kevin Leduc <ke...@wikimedia.org> wrote:
> > +Analytics list so they can comment.
> >
> > I don't have such a script.  It's a pretty intensive job to compile top
> > articles especially over a month.  The pageview API was supposed to have
> top
> > articles per month per wiki but the job is so massive that it failed to
> run
> > in Hive.  Analytics knows there are better algorithms out there to solve
> > this problem.  So the pageview API just has top per day per wiki.
> >
> > I imagine that you are looking at some very specific wikis and
> countries...
> > not all of them.  Maybe someone on the list can make an example hive
> script
> > (given a wiki and country) that gives the top for a day.
> >
> >
> > On Wed, Jan 20, 2016 at 12:23 PM, Dan Foy <d...@wikimedia.org> wrote:
> >>
> >> Hi Kevin,
> >>
> >> In your collection of scripts for Hive, do you have one that can act as
> a
> >> starting point for me to get the top N articles / URLs for Wikipedia in
> a
> >> country?
> >>
> >> Thanks,
> >> Dan
> >>
> >>
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > Analytics@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Tilman Bayer
> Senior Analyst
> Wikimedia Foundation
> IRC (Freenode): HaeB
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to