Hi all, We are now benchmarking several triple stores that support inference through forward chaining against a system that does a particular form of query rewriting.
The benchmark we are using is simple, an extended version of LUBM, using big datasets LUBM 1000, 8000, 15000, 250000. From Jena we would like to benchmark loading time, inference time and query answering time, using both TDB and SDB. Inferences should be done with limited amounts of memory, the less the better. However, we are having difficulties understanding what is the fair way to do this. Also, the system used for this benchmarks should be a simple system, not a cluster or a server with large resources. We would like to ask the community for help to approach this in the best way possible. Hence this email :). Here go some questions and ideas. Is it the case that the default inference engine of Jena requires all triples to be in-memory? Is it not possible to do this on this? If this is so, what would be the fair way to benchmark the system? Right now we are thinking of a workflow as follows: 1. Start a TDB or SDB store. 2. Load 10 LUBMS in memory, compute the closure using Reasoner reasoner = ReasonerRegistry.getOWLReasoner(); InfModel inf = ModelFactory.createInfModel(reasoner, monto, m); and storing the result in SDB or TDB. When finished, 3. Query the store directly. Is this the most efficient way to do it? Are there important parameters (besides the number of universities used in the computation of the closure) that we should tune to guarantee a fair evaluation? Are there any documents that we could use to guide ourselfs during tuning of Jena? Thank you very much in advance everybody, Best regards, Mariano Mariano Rodriguez Muro http://www.inf.unibz.it/~rodriguez/ KRDB Research Center Faculty of Computer Science Free University of Bozen-Bolzano (FUB) Piazza Domenicani 3, I-39100 Bozen-Bolzano BZ, Italy 猴
