One issue that came up in that analysis is the prevalence of "non-topic
topics",  like the "List of X" pages.  Non-topic topics are a great source
of information for extraction,  see

http://en.wikipedia.org/wiki/ISO_3166-2:CN

or

http://en.wikipedia.org/wiki/List_of_South_Dakota_state_symbols

but for many purposes you might want to de-reify them because they aren't
"things" in the same sense that CN and US-SD are.

I don't think DBpedia is diving into tables but I've done a few small
projects getting data out of tables in specific areas.



On Thu, Jan 29, 2015 at 8:08 AM, Magnus Knuth <
magnus.kn...@hpi.uni-potsdam.de> wrote:

> Hi Jörn,
>
> we computed some graph measures based on the Wikipedia link structure.
> Using the actual link structure seemed for a user’s perspective more
> appropriate than the mapped properties. Therefore, we also had to clean up
> the page_links dataset and computed Page Rank, HITS, Inlink and Outlink
> degree of each article in Wikipedia. You can find the datasets for EN and
> DE, 3.9 and 2014 DBpedia at [1].
>
> Maybe, that helps.
>
> Best
> Magnus
>
> [1] http://s16a.org/node/6
>
> Am 28.01.2015 um 17:54 schrieb Jörn Hees <j_h...@cs.uni-kl.de>:
>
> > Hi,
> >
> > it seems it's not as easy as i had thought to get the top subjects,
> predicates and objects, as SPARQL queries such as this
> > ```
> > SELECT ?n COUNT(*) AS ?c
> > WHERE {
> >  ?n ?p ?o.
> > }
> > ORDER BY DESC(?c)
> > LIMIT 10
> > ```
> > just time out / return with partial results.
> >
> >
> > So i compiled them from the NT dumps, as described here (also see for
> full files):
> >
> https://joernhees.de/blog/2015/01/28/dbpedia-2014-stats-top-subjects-predicates-and-objects/
> >
> > Thought this might be of interest to some of you.
> >
> >
> > Turns out there's actually quite a lot of duplicate triples in the dumps:
> >   4891 <
> http://commons.wikimedia.org/wiki/Special:FilePath/Flag_of_Slovenia.svg?width=300>
> <http://purl.org/dc/elements/1.1/rights> <
> http://en.wikipedia.org/wiki/File:Flag_of_Slovenia.svg> .
> >   4891 <
> http://commons.wikimedia.org/wiki/Special:FilePath/Flag_of_Slovenia.svg> <
> http://xmlns.com/foaf/0.1/thumbnail> <
> http://commons.wikimedia.org/wiki/Special:FilePath/Flag_of_Slovenia.svg?width=300>
> .
> >   4891 <
> http://commons.wikimedia.org/wiki/Special:FilePath/Flag_of_Slovenia.svg> <
> http://purl.org/dc/elements/1.1/rights> <
> http://en.wikipedia.org/wiki/File:Flag_of_Slovenia.svg> .
> >   1520 <
> http://commons.wikimedia.org/wiki/Special:FilePath/Naval_Ensign_of_the_United_Kingdom.svg?width=300>
> <http://purl.org/dc/elements/1.1/rights> <
> http://en.wikipedia.org/wiki/File:Naval_Ensign_of_the_United_Kingdom.svg>
> .
> >   1520 <
> http://commons.wikimedia.org/wiki/Special:FilePath/Naval_Ensign_of_the_United_Kingdom.svg>
> <http://xmlns.com/foaf/0.1/thumbnail> <
> http://commons.wikimedia.org/wiki/Special:FilePath/Naval_Ensign_of_the_United_Kingdom.svg?width=300>
> .
> >   1520 <
> http://commons.wikimedia.org/wiki/Special:FilePath/Naval_Ensign_of_the_United_Kingdom.svg>
> <http://purl.org/dc/elements/1.1/rights> <
> http://en.wikipedia.org/wiki/File:Naval_Ensign_of_the_United_Kingdom.svg>
> .
> >   1195 <
> http://commons.wikimedia.org/wiki/Special:FilePath/Airplane_silhouette.svg?width=300>
> <http://purl.org/dc/elements/1.1/rights> <
> http://en.wikipedia.org/wiki/File:Airplane_silhouette.svg> .
> >   1195 <
> http://commons.wikimedia.org/wiki/Special:FilePath/Airplane_silhouette.svg>
> <http://xmlns.com/foaf/0.1/thumbnail> <
> http://commons.wikimedia.org/wiki/Special:FilePath/Airplane_silhouette.svg?width=300>
> .
> >   1195 <
> http://commons.wikimedia.org/wiki/Special:FilePath/Airplane_silhouette.svg>
> <http://purl.org/dc/elements/1.1/rights> <
> http://en.wikipedia.org/wiki/File:Airplane_silhouette.svg> .
> >
> >
> >
> > Top10 Subjects:
> >   8118 <
> http://dbpedia.org/resource/Alphabetical_list_of_communes_of_Italy>
> >   7110 <http://dbpedia.org/resource/List_of_places_in_Afghanistan>
> >   6162 <
> http://dbpedia.org/resource/Index_of_Andhra_Pradesh-related_articles>
> >   5857 <
> http://dbpedia.org/resource/List_of_populated_places_in_Bosnia_and_Herzegovina
> >
> >   5712 <http://dbpedia.org/resource/2013_in_film>
> >   5550 <http://dbpedia.org/resource/List_of_municipalities_of_Brazil>
> >   5458 <http://dbpedia.org/resource/List_of_dialling_codes_in_Germany>
> >   5405 <
> http://dbpedia.org/resource/IUCN_Red_List_vulnerable_species_(Plantae)>
> >   5392 <
> http://dbpedia.org/resource/List_of_CJK_Unified_Ideographs,_part_3_of_4>
> >   5392 <
> http://dbpedia.org/resource/List_of_CJK_Unified_Ideographs,_part_2_of_4>
> >   5392 <
> http://dbpedia.org/resource/List_of_CJK_Unified_Ideographs,_part_1_of_4>
> >
> > Top10 Predicates:
> > 149707899 <http://dbpedia.org/ontology/wikiPageWikiLink>
> > 86391520 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> > 33958849 <http://www.w3.org/2002/07/owl#sameAs>
> > 18731754 <http://purl.org/dc/terms/subject>
> > 13926391 <http://www.w3.org/2000/01/rdf-schema#label>
> > 13494896 <http://dbpedia.org/ontology/wikiPageRevisionID>
> > 13494875 <http://www.w3.org/ns/prov#wasDerivedFrom>
> > 13494819 <http://dbpedia.org/ontology/wikiPageID>
> > 10948106 <http://dbpedia.org/ontology/wikiPageOutDegree>
> > 10948106 <http://dbpedia.org/ontology/wikiPageLength>
> >
> > Top10 Objects:
> > 10948086 <http://xmlns.com/foaf/0.1/Document> .
> > 10948086 "en"^^<http://www.w3.org/2001/XMLSchema#string> .
> > 6239553 "1"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
> > 2250659 <http://dbpedia.org/class/yago/PhysicalEntity100001930> .
> > 2169386 <http://dbpedia.org/class/yago/Object100002684> .
> > 2155200 <http://www.w3.org/2002/07/owl#Thing> .
> > 1974654 <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Agent> .
> > 1974654 <http://dbpedia.org/ontology/Agent> .
> > 1816213 <http://dbpedia.org/class/yago/YagoLegalActorGeo> .
> > 1650316 <http://xmlns.com/foaf/0.1/Person> .
> > 1649647 <http://wikidata.dbpedia.org/resource/Q5> .
> > 1649647 <http://wikidata.dbpedia.org/resource/Q215627> .
> > 1649647 <http://schema.org/Person> .
> >
> >
> > Cheers,
> > Jörn
> >
> >
> >
> ------------------------------------------------------------------------------
> > Dive into the World of Parallel Programming. The Go Parallel Website,
> > sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> > hub for all things parallel software development, from weekly thought
> > leadership blogs to news, videos, case studies, tutorials and more. Take
> a
> > look and join the conversation now. http://goparallel.sourceforge.net/
> > _______________________________________________
> > Dbpedia-discussion mailing list
> > Dbpedia-discussion@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
> --
> Magnus Knuth
>
> Hasso-Plattner-Institut für Softwaresystemtechnik GmbH
> Prof.-Dr.-Helmert-Str. 2-3
> 14482 Potsdam
>
> Amtsgericht Potsdam, HRB 12184
> Geschäftsführung: Prof. Dr. Christoph Meinel
>
> tel:     +49 331 5509 547
> email:   magnus.kn...@hpi.de
> web:     http://www.hpi.de/
> webID:   http://magnus.13mm.de/
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>



-- 
Paul Houle
Expert on Freebase, DBpedia, Hadoop and RDF
(607) 539 6254    paul.houle on Skype   ontolo...@gmail.com
http://legalentityidentifier.info/lei/lookup
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to