> From: Jakob Fix [mailto:jakob....@gmail.com] > Sent: Friday, February 20, 2015 7:27 PM > To: dbpedia-discussion > Subject: [Dbpedia-discussion] question regarding retrieval of news > organisations
Hi Jacob, thanks for that interesting question! I'm copying FP7 Multisensor, since that may be of interest. Jacob's question (and ensuing discussion) can be found on: - https://groups.google.com/forum/#!forum/thosch : I like this one a bit better - https://www.mail-archive.com/dbpedia-discussion@lists.sourceforge.net/ > I'd like to look up information in dbpedia.org that's related to news > sources (websites, newspapers, media organisations, blogs, etc.). Not the question you asked, but: Check "rdf:type yago:SOMETHING" for some of the news orgs, and chances are you'll find a type that identifies most of them. > http://dbpedia.org/page/Haaretz for example, I see > dbpedia-owl:wikiPageExternalLink, dbpprop:website, foaf:homepage which > all have the website's link. Is there a canonical one to use? - wikiPageExternalLink is ANY link mentioned in the article. If it's a host name without any path, then chances are that's the home page of the subject, but no guarantee. - foaf:homepage must be the home page of the subject, though someone may have made a mistake in wikipedia or in a mapping - also check foaf:page. It's "any" page related to the subject. It's a super-prop of foaf:homepage, but I'm not sure that reasoning is effected - dbpprop:website is a raw property that may be used in some wikipedia templates but not others. I think the best is to check all these methods over two sets: - news orgs identified by yago:SOMETHING - news orgs in your list of home pages > http://dbpedia.org/page/Irish_Examiner) I see that there are > dbprop:headquarters (contains a string) and dbpedia-owl:headquarter > (contains a resource which is much more useful) from where I can then > retrieve information about the city and the country. dbprop is a raw property, dbpedia-owl is a mapped property. - You can find all uses of that mapped property (i.e. all maps that use it) here: http://mappings.dbpedia.org/index.php?title=Special%3ASearch&redirs=0&search=%22ontologyproperty+headquarter%22 - You can find other ways to explore the ontology and mapping here: http://mappings.dbpedia.org/index.php/Exploring_the_Ontology - Unfortunately there's no way to restrict to English mappings only. All variations of "mapping en" "ontologyproperty headquarter" failed. Anyway, the map you want is here (but there's also "television channel" etc): http://mappings.dbpedia.org/index.php?title=Mapping_en:Infobox_newspaper&action=edit - I checked a few wikipedia instances and all use "headquarters" consistently - I guess same for website->foaf:homepage Back to your question: "dbprop:headquarters (contains a string), dbpedia-owl:headquarter (contains a resource which is much more useful)". - yes, dbo:headquarter gets the links (because it's an ObjectProperty), dbp:headquarters gets any text - E.g. see the split in http://dbpedia.org/page/Fit_TV: dbp:headquarters 132 (xsd:integer); 93213 (xsd:integer); Fax : 1 49 22 22 35; Tel : 1 49 22 20 01 dbo:headquarter dbpedia:Saint-Denis - the ObjectProperty extractor takes ANY link, no matter what it signifies. - are there dbo:headquarter that are not Places? Yep, there are some thousands: select * {?x dbpedia-owl:headquarter ?y filter not exists {?y a dbpedia-owl:Place}} - My favorite example is http://dbpedia.org/resource/Asheville_Citizen-Times dbo:headquarter http://dbpedia.org/resource/O._Henry I think that's a fiction newspaper devised by O.Henry ;-) - BTW, for some nefarious reason http://dbpedia.org/resource/Madrid is not a Place :-) so let's refine the query: select * { ?x dbpedia-owl:headquarter ?y filter not exists {?y a dbpedia-owl:Place} filter (?y != dbpedia:Madrid) } You may also want to check what fields are not yet mapped: http://mappings.dbpedia.org/server/templatestatistics/en/?template=Infobox_newspaper - should be interesting: 20 publishing_city, publishing_country - these may also be interesting: ISSN, oclc The description of all template fields is at: https://en.wikipedia.org/wiki/Template:Infobox_newspaper > I have the impression that one cannot necessarily trust that certain > information will be in dbpedia.org (not a problem, it obviously being > a volunteer effort). Exactly. Adding mappings is easy, and they appear on live.dbpedia.org shortly afterwards. > I'm hoping, in this proof of concept, to use > some sparql queries and dbpedia.org as it appears to have the richest > information, but if there are other linked data endpoints that contain > more/better information on news organisations across the world Try wikidata. - Pick a smaller newspaper (e.g. Chicago Defender) - It's guaranteed to be found since wikidata reflects all of wikipedia: https://www.wikidata.org/wiki/Q961669 - But does it have an appropriate type? Yep, "instance of=newspaper" - Do they have the home page? Nope. That's the strength of DBpedia: extracts a lot more props, even though not always perfectly. Counting: - How many newspapers in wikidata? http://vladimiralexiev.github.io/CH-names/README.html#sec-2-1-4 (as of Dec 2014) -> 6187 Also see summary of some other counts, and a gist with all instance counts - How many in wikidata today? Click a query here: http://wdq.wmflabs.org/wdq/ : CLAIM[31:11032] And then run it here: http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A11032%5D -> 6275 - How many with webpage? CLAIM[31:11032] AND CLAIM[856] http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A11032%5D%20AND%20CLAIM%5B856%5D -> only 646 - How many in dbpedia.org? select count(*) {?x a dbpedia-owl:Newspaper} -> 6043 (as of... hmm.. Aug 2014: pretty close to wikidata) - How many with webpage? select count(*) {?x a dbpedia-owl:Newspaper. filter exists {?x foaf:homepage ?y}} -> 4666. Pretty Good! -- Question: why did I not use this simpler query? select count(*) {?x a dbpedia-owl:Newspaper; foaf:homepage ?y} - How many in live.dbpedia.org? select count(*) {?x a dbpedia-owl:Newspaper} -> 2583. OOPS. I mean ***OOPS*** Let me know if you find something better. ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk _______________________________________________ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion