ANN: New Berlin SPARQL Benchmark results for datasets ranging from 10 million to 150 billion RDF triples

Christian Bizer Mon, 29 Apr 2013 04:55:30 -0700

Hi all,

Berlin SPARQL Benchmark (BSBM) is a benchmark for measuring theperformance of storage systems that expose SPARQL endpoints. Thebenchmark is built around an e-commerce use case in which a set ofproducts is offered by different vendors.The benchmark defines two querymixes:1. The query mix of theExplore use case<http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/ExploreUseCase/index.html>illustratesthe search and navigation pattern of a consumer looking for a productvia some web portal.2. The query mix of theBusiness Intelligence use case<http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/BusinessIntelligenceUseCase/index.html>simulatesdifferent stakeholders asking analytical questions against the dataset.The query mix relies heavily on SPARQL 1.1 constructs like GROUP BY andCOUNT() and is designed to touch large portions of the benchmark dataset.

I'm happy to announce the results of a new BSBM benchmark experiment.The experiment compares the performance of


1. BigData
2. BigOwlim
3. Jena TDB
4. Virtuoso

on a single machine using datasets ranging from 10 million to 1 billionRDF triples (Explore and Business Intelligence query mixes).


In addition, it compares the performance of

1. BigOwlim
2. Virtuoso

on a cluster of 8 machines using datasets ranging from 10 billion to 150billion RDF triples (Explore and Business Intelligence query mixes).


The results of the experiment are found at

http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/results/V7/

I think that the results are quite impressive and demonstrate thatSPARQL stores got a lot more mature over the last years.

A year ago, many RDF stores still had problems with the SPARQL 1.1constructs GROUP BY and COUNT() and were thus not able to execute theBusiness Intelligence query mix. Now, all systems pass this test andsome of the systems show an impressive performance on grouping andaggregating the data.

The 150 billion triples experiment has shown that given proper hardware,it is possible to run analytical queries on amounts of data that arebeyond most (all?) of today's use cases: The whole LOD Cloud [1] isestimated to consist only of 31 billion triples; the RDFa, Microdata andMicroformat dataset extracted by the WebDataCommons [2] project from 3billion HTML pages only consists of 7.3 billion triples. So, 150 billiontriples leave quite some room for the further growth of structured dataon the Web ;-)

More information about the Berlin SPARQL benchmark, the exactspecification of the benchmark query mixes, as well as results fromprevious benchmarking experiments are found at


http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/

Lots of thanks to Peter Boncz and Minh-Duc Pham who conducted the newexperiment as part of the EU project LOD2 and have provided theirresults for being published on the BSBM website.


Cheers,

Chris

[1] http://lod-cloud.net/state/
[2] http://www.webdatacommons.org/

ANN: New Berlin SPARQL Benchmark results for datasets ranging from 10 million to 150 billion RDF triples

Reply via email to