Re: Fair Benchmarking of SDB, TDB and LUBM 100 > with inference support and limited memory

Andy Seaborne Mon, 05 Dec 2011 14:25:25 -0800

On 05/12/11 13:58, Mariano Rodriguez wrote:

In this case of this first initial round of benchmarks we want to avoid any 
Hadoop or
map-reduce approaches. The reason is that
we want to have raw numbers of the core reasoning techniques, in this case 
forward chaining
vs. backward chaining and our technique called semantic indexes which is a bit 
like backward
chaining but with a tiny bit of extra work at loading time. We want to avoid 
evaluating
benefits from the architecture of the system (map-reduce for example) because 
the technique that we are
testing can also be extended with map-reduce and a parallel architecture.

In the past, I've experimented with forward-chaining the schema anddoing one step of backward chaining in the query.

Merely forward chaining everything (even just the useful subclass,subproperty, domain and range as is done by riotcmd.infer) causes triplebloat and, at scale, the bloat can reduce the effectiveness of disk caching.

But pure backward chaining has a horrible access pattern on the data(walking arbitrary length paths):


?x rdf:type/rdfs:subClassOf* :type

?x ?p ?v . ?p rdfs:subPropertyOf* :property

(obviously you don't have to do it this way - this is just the naive wayand it can be written in SPARQL 1.1 - it's even in the spec).

Assuming the schema is small compared to the data and fixed,preprocessing the schema to have a single table of (type, supertype)with the transitive closure turns it into two patterns:


?x rdf:type ?var . table(?var, :type)

LUBM is unusual in several ways. All systems I know of, load faster onLUBM than any other benchmark because it has a low node to triple ratio(i.e. it is very interconnected within each university). RDFS-leveliInference increase this effect because inference can add triples butnot create new RDF terms. Loading nodes means the bytes for the URI orliteral need to be stored needing more work.

It would be easy to add this to TDB (the prototyping was for SDB whereit's more important due to JDBC-isms) - doing it as part of the moregeneral property tables would be interesting.


TDB scales much better than SDB (load and query).

        Andy

Re: Fair Benchmarking of SDB, TDB and LUBM 100 > with inference support and limited memory

Reply via email to