I've just made a mirror of the Wikimedia pagecounts in a requester
pays bucket in the AWS cloud.

http://basekb.com/subjectiveEye/wikipedia_traffic_page_counts.php

This data in S3 can be efficiently used from a Hadoop cluster based in
AWS and there is an open source package to do this that requires
nothing more than your AWS credentials to start

https://github.com/paulhoule/telepath

With hourly hit statistics for all URIs in all Wikimedia projects,
this rich data set contains a wealth of information.


-- 
Paul Houle
Expert on Freebase, DBpedia, Hadoop and RDF
(607) 539 6254    paul.houle on Skype   ontolo...@gmail.com
ᐧ

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their 
applications. Written by three acclaimed leaders in the field, 
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to