Jena Performance Framework

Andy Seaborne Wed, 28 Sep 2011 08:25:14 -0700

(a bit of a "thinking out loud" message - feel free to comment)

I find myself wanting to test out ideas and wanting to quickly andeasily run performance tests to see if some change makes an observabledifference. Such changes may be small (e.g. a different implementationof Binding) or they may be significant (e.g. paged radix trees forindexing). At the moment, just setting up the test can be time consuming.

There are various RDF benchmarks but each is a standalone system andeach generates results in it's own format. It's hard enough to rememberhow to get each one running because each is different.

Wouldn't it be nice if there was standard framework for runningperformance tests?

It would make it quicker to develop new benchmarks since every benchmarkhas a target scenario in mind and outside that scenario it's hard to getmuch insight from the results.

It would be quicker to just run on different setups, and not have toport each of the various existing benchmarks.


So the framework is:

+ an environment to run performance tests
+ a library of pre-developed/contributed tests
  (data + query mix scripts , drivers for specific systems)
+ common output formats
+ common ways to report results (spreadsheets, graphing tools)
+ documented and tested.

It could also be used for performance regression testing.

To this end, I've started a new module in SVN Experimental/JenaPerf. Atthe moment it is no more than some small doodlings to get the classesand structure right (it's in Scala). On the principle of start smalland iterate, it's going to be for SPARQL query tests first for somethingeasy, and generating CSV files of results. Scripting wil be "unsubtle":-) Plugging-in runtime analysis tools is for "much later".

The other thing it would be good to have is a common public copy of thevarious datasets so exactly the same data can be used in differentplaces. There is a time and place for randomized data creation ... anda time and place for fixed data.

As these are quite big, SVN (or etc) is not the place and not the focusof Apache CMS either.

For now, I'll rsync what I have to appear inhttp://people.apache.org/andy/RDF_Data. (This will have to be doneincrementally during "off peak" hours because it does rather use up allthe upstream bandwidth and my colleagues or family, depending onlocation, might like to use some of it as well.)

What are good benchmarks? The usual suspects are LUBM, BSBM, SP2B. Weare interested in one that is centred queries and query patterns we seefor linked data applications.


Any other resources to draw on?

JUnitPerf (and things of that ilk) assume you are writing tests in code.The frameworks seem to provide little because the bulk of the work iswriting the SPARQL specific code. Useful learnings though.


The SPARQL-WG tests are scripted (in RDF) and that has worked very well.

    Andy

Jena Performance Framework

Reply via email to