I was looking at the Dbpedia 3.9 files today and I noticed that
redirects are not available for Wikipedia outside "en" and I'm
wondering why that is.

      Lately I've cooked down the wikipedia pagecounts to produce a
"3D" data set that summarizes interest in topics (uh,  hits to URIs)
over the 2008-2013 time frame.  The source code for this is

http://github.com/paulhoule/telepath

     This data has all sorts of problems,  but probably the worst of
them is that it is chock full of URIs that don't correspond to DBpedia
concepts,  for instance "Justin_Bieber_Die_Die_Die_Die Die",  which
are caused by people creating junk pages on Wikipedia that get
deleted,  people typing URIs wrong,  etc.

     The obvious thing to do is to filter out topics that don't exist
in DBpedia and also to resolve redirects so that people who visited
"Communists" get credited as visiting "Communism" and so forth.

     I think a good list of valid DBpedia URIs can be had from the
list of page id's in "en" and redirects can be gotten out of the
"transitive redirects".  En is responsible for about 1/3 of the views
these days,  but it would be really fun to have something that works
in all culture zones so we can see what is "big in Japan" and so forth

------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to