On 11/10/11 08:12, Paolo Castagna wrote:
Hi Andy,
are you planning to put a few datasets in SVN together with the queries in
JenaPerf?
I saw a data directory for LUBM but not data in it:
https://svn.apache.org/repos/asf/incubator/jena/Experimental/JenaPerf/trunk/Benchmarks/LUBM/Data/
From a user perspective it would be great to just do:
svn co
https://svn.apache.org/repos/asf/incubator/jena/Experimental/JenaPerf/trunk
JenaPerf
cd JenaPerf
./run
Installing any of LUBM, BSBM or SP2B (although not incredibly complicate) isn't
trivial.
LUBM: The generator and test driver code is GPL. The queries I have are
taken from the published paper, translated by me to SPARQL so can they
be distributed. Data can be generated.
BSBM: The queries are actually templates and instantiated at runtime
using a configuration file which is generated when the data is
generated. Generating data isn't just creating RDF triples.
The queries templates exist in the code base (bsbmtools on SF). I have
been talking to the creators and the license has changed from GPL to AL
(thanks guys). So it will be possible to include queries from the
codebase - the templating will have to be written. (the license change
affects JenaPerf becuase it is redistributing, unlike downloading and
running).
SP2B is published under BSD.
Andy
From a community and project perspective, it's quite good and helpful
to have a standard set of datasets. Although, I realize that if datasets
are not small, it might take a while to download them.
Can we use .gz datasets with JenaPerf?
We could also include small-medium size dataset together with JenaPerf
and have a separate checkout/download for larger ones.
What do you think?
Paolo