Hi all,
Berlin SPARQL Benchmark (BSBM) is a benchmark for measuring the
performance of storage systems that expose SPARQL endpoints. The
benchmark is built around an e-commerce use case in which a set of
products is offered by different vendors.The benchmark defines two query
mixes:
1. The query mix of theExplore use case
<http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/ExploreUseCase/index.html>illustrates
the search and navigation pattern of a consumer looking for a product
via some web portal.
2. The query mix of theBusiness Intelligence use case
<http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/BusinessIntelligenceUseCase/index.html>simulates
different stakeholders asking analytical questions against the dataset.
The query mix relies heavily on SPARQL 1.1 constructs like GROUP BY and
COUNT() and is designed to touch large portions of the benchmark dataset.
I'm happy to announce the results of a new BSBM benchmark experiment.
The experiment compares the performance of
1. BigData
2. BigOwlim
3. Jena TDB
4. Virtuoso
on a single machine using datasets ranging from 10 million to 1 billion
RDF triples (Explore and Business Intelligence query mixes).
In addition, it compares the performance of
1. BigOwlim
2. Virtuoso
on a cluster of 8 machines using datasets ranging from 10 billion to 150
billion RDF triples (Explore and Business Intelligence query mixes).
The results of the experiment are found at
http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/results/V7/
I think that the results are quite impressive and demonstrate that
SPARQL stores got a lot more mature over the last years.
A year ago, many RDF stores still had problems with the SPARQL 1.1
constructs GROUP BY and COUNT() and were thus not able to execute the
Business Intelligence query mix. Now, all systems pass this test and
some of the systems show an impressive performance on grouping and
aggregating the data.
The 150 billion triples experiment has shown that given proper hardware,
it is possible to run analytical queries on amounts of data that are
beyond most (all?) of today's use cases: The whole LOD Cloud [1] is
estimated to consist only of 31 billion triples; the RDFa, Microdata and
Microformat dataset extracted by the WebDataCommons [2] project from 3
billion HTML pages only consists of 7.3 billion triples. So, 150 billion
triples leave quite some room for the further growth of structured data
on the Web ;-)
More information about the Berlin SPARQL benchmark, the exact
specification of the benchmark query mixes, as well as results from
previous benchmarking experiments are found at
http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/
Lots of thanks to Peter Boncz and Minh-Duc Pham who conducted the new
experiment as part of the EU project LOD2 and have provided their
results for being published on the BSBM website.
Cheers,
Chris
[1] http://lod-cloud.net/state/
[2] http://www.webdatacommons.org/