Any idea why the most popular article in India is "-"? CCing Dan Garry of Discovery team.
On Fri, Jan 22, 2016 at 5:13 PM, Tilman Bayer <tba...@wikimedia.org> wrote: > Below is an example Hive query yielding the 50 most viewed pages in > India during December 2015. It took less than 10 minutes of wall clock > time to complete. > > SELECT CONCAT('https://',project,'.org/wiki/',page_title), > SUM(view_count) AS views > FROM wmf.pageview_hourly > WHERE > year = 2015 > AND month = 12 > AND country = "India" > AND agent_type = "user" > GROUP BY project, page_title > ORDER BY views DESC LIMIT 50; > > ... > Total MapReduce CPU Time Spent: 0 days 19 hours 13 minutes 2 seconds 930 > msec > OK > _c0 views > https://en.wikipedia.org/wiki/Main_Page 43515253 > https://en.wikipedia.org/wiki/Special:Search 4818687 > https://en.wikipedia.org/wiki/- 2650346 > https://en.wikipedia.org/wiki/Bajirao_I 1414810 > https://en.wikipedia.org/wiki/Dilwale_(2015_film) 1410015 > https://en.wikipedia.org/wiki/Mastani 1232964 > https://en.wikipedia.org/wiki/Bajirao_Mastani_(film) 1133261 > https://en.wikipedia.org/wiki/List_of_Bollywood_films_of_2015 632890 > https://en.wikipedia.org/wiki/Hate_Story_3 582816 > https://en.wikipedia.org/wiki/Special:MobileMenu 499379 > https://en.wikipedia.org/wiki/Star_Wars:_The_Force_Awakens 438113 > https://en.wikipedia.org/wiki/Tamasha_(film) 390519 > https://en.wikipedia.org/wiki/Prem_Ratan_Dhan_Payo 378133 > https://en.wikipedia.org/wiki/India 368946 > https://en.wikipedia.org/wiki/List_of_Bollywood_films_of_2016 335547 > https://en.wikipedia.org/wiki/Star_Wars 334326 > https://en.wikipedia.org/wiki/Sunny_Leone 333848 > https://en.wikipedia.org/wiki/Sundar_Pichai 329264 > https://en.wikipedia.org/wiki/Special:Book 324255 > https://en.wikipedia.org/wiki/List_of_highest-grossing_Bollywood_films > 321418 > https://en.wikipedia.org/wiki/Salman_Khan 309113 > https://en.wikipedia.org/wiki/'Tis_the_Season 308221 > https://en.wikipedia.org/wiki/Mandana_Karimi 289662 > https://en.wikipedia.org/wiki/Kyaa_Kool_Hain_Hum_3 281801 > https://en.wikipedia.org/wiki/Kashibai 272673 > https://en.wikipedia.org/wiki/Bigg_Boss_9 272203 > https://en.wikipedia.org/wiki/Kriti_Sanon 266773 > https://en.wikipedia.org/wiki/2012_Delhi_gang_rape 265296 > https://en.wikipedia.org/wiki/Shah_Rukh_Khan 263729 > https://en.wikipedia.org/wiki/Neerja_Bhanot 259410 > https://en.wikipedia.org/wiki/Nora_Fatehi 252085 > https://en.wikipedia.org/wiki/Ashoka 250255 > https://en.wikipedia.org/wiki/B._K._S._Iyengar 248422 > https://en.wikipedia.org/wiki/2015_South_Indian_floods 246377 > https://en.wikipedia.org/wiki/Baahubali:_The_Beginning 244281 > https://en.wikipedia.org/wiki/Shamsher_Bahadur_I_(Krishna_Rao) 232122 > https://en.wikipedia.org/wiki/Christmas 228278 > https://en.wikipedia.org/wiki/Thanga_Magan_(2015_film) 222373 > https://en.wikipedia.org/wiki/Ranveer_Singh 221010 > https://en.wikipedia.org/wiki/A._P._J._Abdul_Kalam 220612 > https://en.wikipedia.org/wiki/Shivaji 218245 > https://en.wikipedia.org/wiki/Deepika_Padukone 218242 > https://en.wikipedia.org/wiki/TLC:_Tables,_Ladders_and_Chairs_(2015) > 211920 > https://en.wikipedia.org/wiki/Gizele_Thakral 206585 > https://en.wikipedia.org/wiki/Urvashi_Rautela 204305 > https://en.wikipedia.org/wiki/Peshwa 194957 > https://en.wikipedia.org/wiki/Kajol 192044 > https://hi.wikipedia.org/wiki/मुखपृष्ठ 184274 > https://en.wikipedia.org/wiki/Quantico_(TV_series) 183112 > https://en.wikipedia.org/wiki/Mahatma_Gandhi 182336 > Time taken: 562.621 seconds, Fetched: 50 row(s) > > > See also the discussion at https://phabricator.wikimedia.org/T120113 > (As mentioned there, a while ago I retrieved the global top 200 pages > for a timespan of almost six months, with some wait time but no major > issues. It's not quite clear to me why the "brute force" approach > mentioned in the ticket failed, but I guess it had to do with the > difficulty of repeating such a query for all projects - or countries - > to generate top lists for every one of them.) > > On Wed, Jan 20, 2016 at 12:42 PM, Kevin Leduc <ke...@wikimedia.org> wrote: > > +Analytics list so they can comment. > > > > I don't have such a script. It's a pretty intensive job to compile top > > articles especially over a month. The pageview API was supposed to have > top > > articles per month per wiki but the job is so massive that it failed to > run > > in Hive. Analytics knows there are better algorithms out there to solve > > this problem. So the pageview API just has top per day per wiki. > > > > I imagine that you are looking at some very specific wikis and > countries... > > not all of them. Maybe someone on the list can make an example hive > script > > (given a wiki and country) that gives the top for a day. > > > > > > On Wed, Jan 20, 2016 at 12:23 PM, Dan Foy <d...@wikimedia.org> wrote: > >> > >> Hi Kevin, > >> > >> In your collection of scripts for Hive, do you have one that can act as > a > >> starting point for me to get the top N articles / URLs for Wikipedia in > a > >> country? > >> > >> Thanks, > >> Dan > >> > >> > > > > > > _______________________________________________ > > Analytics mailing list > > Analytics@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > -- > Tilman Bayer > Senior Analyst > Wikimedia Foundation > IRC (Freenode): HaeB > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics