A few minor additions: On 20 April 2014 18:58, Volha Bryl <vo...@informatik.uni-mannheim.de> wrote: > Hi, > > As it's me who have done the statistics you mention, let me try to clarify. > [1] and [2] are based on DBpedia dumps for 3.9 and 3.8, respectively. The > last DBpedia paper has the numbers for 3.8 - and in the statistics page for > 3.8 [2] you indeed find 3.7 mln entities. Why is "400 mln triples" not > there? Because [2] counts *just* raw property statements extracted from > infoboxes (65 mln), type statements (13.7 mln) and mapped (to DBpedia > ontology) property statements (33.7 mln). It does not count however many > other triples: those coming from inter-language links, abstracts, > categories, links to other resources and so on, check the download pages for > the whole list [3,4]. If you count all these, perhaps, you'll arrive at 400 > mln triples. In fact, > SELECT COUNT(*) WHERE {?x ?y ?z} > executed against DBpedia SPARQL endpoint returns 825,761,509 at the moment. > And actually I am not sure that all datasets available at [5] are loaded > into the endpoint
No, only certain datasets are loaded. They are listed here: http://wiki.dbpedia.org/DatasetsLoaded39 > so the total number for English can be even bigger. > > Summarizing, [1,2] are good sources for getting numbers of things/instances. > For the number of triples - depends on what you want to count. For types and > properties refer to [1,2], for total number of triples - refer to SPARQL > endpoints for English and some other languages for which the endpoints > exist. Or go through the dumps and count :) The number of lines in each dataset file is listed in this file: https://github.com/dbpedia/extraction-framework/blob/master/scripts/src/main/data/lines-bytes-packed.txt There are a few comment lines in each file, so the number of triples is slightly lower, but not by much. I just counted the lines in all English NT files by the following command. (grep -v is necessary to remove a few files that contain almost the same triples as other files.) grep 'en/.*\.nt' lines-bytes-packed.txt | grep -vE 'unredirected|same_as|see_also|chapters|cleaned' | awk '{sum+=$3} END {print sum}' Result for en: 488 million triples. For all languages: 3.1 billion triples Regards, JC > > Cheers, > Volha > > > [1] http://wiki.dbpedia.org/Datasets39/DatasetStatistics > [2] http://wiki.dbpedia.org/Datasets38/DatasetStatistics > [3] http://wiki.dbpedia.org/Downloads39 > [4] http://wiki.dbpedia.org/Downloads38 > [5] http://downloads.dbpedia.org/3.9/en/ > > > > > > On 4/19/2014 11:59 PM, Gunaratna, Dalkandura Arachchige Kalpa Shashika Silva > wrote: > > Hi, > > I want to know the correct number of instances and total triples for > theEnglish version of DBpedia 3.9. I have come across the DBpedia statistics > page (http://wiki.dbpedia.org/Datasets39/DatasetStatistics) but it is > confusing for me to get the numbers correct. > > > The reason being, I read in a paper that they mentioned in DBpedia version > 3.4, it had 3.5 entities (instances) with 672 million triples. > > > Having that in mind, DBpedia statistics page says that version 3.9 has 4 > million (4,004,478) instances and 70 million (70,147,399) raw statements. > > > Recent DBpedia paper > (http://svn.aksw.org/papers/2013/SWJ_DBpedia/public.pdf) says the English > version has 3.7 million things described in 400 million triples. I believe > they are talking about version 3.8. But this number also does not match with > the version 3.8 table in the statistics page. > > > Couls somebody clarify the correct numbers for me for both English version > and whole DBpedia. Total number of things (instances) and total number of > triples. > > > Thank you very much. > > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/NeoTech > > > > _______________________________________________ > Dbpedia-discussion mailing list > Dbpedia-discussion@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > > > > -- > --------------------- > Dr. Volha Bryl > Postdoctoral Researcher > Chair of Information Systems V > Web-based Systems Group > Universität Mannheim > B6, 26, Room C1.03 > D-68131 Mannheim > > Tel.: +49 621 181 2657 > Mail: vo...@informatik.uni-mannheim.de > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/NeoTech > _______________________________________________ > Dbpedia-discussion mailing list > Dbpedia-discussion@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > ------------------------------------------------------------------------------ Start Your Social Network Today - Download eXo Platform Build your Enterprise Intranet with eXo Platform Software Java Based Open Source Intranet - Social, Extensible, Cloud Ready Get Started Now And Turn Your Intranet Into A Collaboration Platform http://p.sf.net/sfu/ExoPlatform _______________________________________________ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion