A few minor additions:

On 20 April 2014 18:58, Volha Bryl <vo...@informatik.uni-mannheim.de> wrote:
> Hi,
>
> As it's me who have done the statistics you mention, let me try to clarify.
> [1] and [2] are based on DBpedia dumps for 3.9 and 3.8, respectively. The
> last DBpedia paper has the numbers for 3.8 -  and in the statistics page for
> 3.8 [2] you indeed find 3.7 mln entities. Why is "400 mln triples" not
> there? Because [2] counts *just* raw property statements extracted from
> infoboxes (65 mln), type statements (13.7 mln) and mapped (to DBpedia
> ontology) property statements (33.7 mln). It does not count however many
> other triples: those coming from inter-language links, abstracts,
> categories, links to other resources and so on, check the download pages for
> the whole list [3,4]. If you count all these, perhaps, you'll arrive at 400
> mln triples. In fact,
> SELECT COUNT(*) WHERE {?x ?y ?z}
> executed against DBpedia SPARQL endpoint returns 825,761,509 at the moment.
> And actually I am not sure that all datasets available at [5] are loaded
> into the endpoint

No, only certain datasets are loaded. They are listed here:
http://wiki.dbpedia.org/DatasetsLoaded39

> so the total number for English can be even bigger.
>
> Summarizing, [1,2] are good sources for getting numbers of things/instances.
> For the number of triples - depends on what you want to count. For types and
> properties refer to [1,2], for total number of triples - refer to SPARQL
> endpoints for English and some other languages for which the endpoints
> exist. Or go through the dumps and count :)

The number of lines in each dataset file is listed in this file:

https://github.com/dbpedia/extraction-framework/blob/master/scripts/src/main/data/lines-bytes-packed.txt

There are a few comment lines in each file, so the number of triples
is slightly lower, but not by much.

I just counted the lines in all English NT files by the following
command. (grep -v is necessary to remove a few files that contain
almost the same triples as other files.)

grep 'en/.*\.nt' lines-bytes-packed.txt  | grep -vE
'unredirected|same_as|see_also|chapters|cleaned' | awk '{sum+=$3} END
{print sum}'

Result for en: 488 million triples.
For all languages: 3.1 billion triples

Regards,
JC

>
> Cheers,
> Volha
>
>
> [1] http://wiki.dbpedia.org/Datasets39/DatasetStatistics
> [2] http://wiki.dbpedia.org/Datasets38/DatasetStatistics
> [3] http://wiki.dbpedia.org/Downloads39
> [4] http://wiki.dbpedia.org/Downloads38
> [5] http://downloads.dbpedia.org/3.9/en/
>
>
>
>
>
> On 4/19/2014 11:59 PM, Gunaratna, Dalkandura Arachchige Kalpa Shashika Silva
> wrote:
>
> Hi,
>
> I want to know the correct number of instances and total triples for
> theEnglish version of DBpedia 3.9. I have come across the DBpedia statistics
> page (http://wiki.dbpedia.org/Datasets39/DatasetStatistics) but it is
> confusing for me to get the numbers correct.
>
>
> The reason being, I read in a paper that they mentioned in DBpedia version
> 3.4, it had 3.5 entities (instances) with 672 million triples.
>
>
> Having that in mind, DBpedia statistics page says that version 3.9 has 4
> million (4,004,478) instances and 70 million (70,147,399) raw statements.
>
>
> Recent DBpedia paper
> (http://svn.aksw.org/papers/2013/SWJ_DBpedia/public.pdf) says the English
> version has 3.7 million things described in 400 million triples. I believe
> they are talking about version 3.8. But this number also does not match with
> the version 3.8 table in the statistics page.
>
>
> Couls somebody clarify the correct numbers for me for both English version
> and whole DBpedia. Total number of things (instances) and total number of
> triples.
>
>
> Thank you very much.
>
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/NeoTech
>
>
>
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
>
> --
> ---------------------
> Dr. Volha Bryl
> Postdoctoral Researcher
> Chair of Information Systems V
> Web-based Systems Group
> Universität Mannheim
> B6, 26, Room C1.03
> D-68131 Mannheim
>
> Tel.: +49 621 181 2657
> Mail: vo...@informatik.uni-mannheim.de
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/NeoTech
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>

------------------------------------------------------------------------------
Start Your Social Network Today - Download eXo Platform
Build your Enterprise Intranet with eXo Platform Software
Java Based Open Source Intranet - Social, Extensible, Cloud Ready
Get Started Now And Turn Your Intranet Into A Collaboration Platform
http://p.sf.net/sfu/ExoPlatform
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to