Re: indexing [was Re: Performance of successive identical queries]

Andy Seaborne Thu, 08 Mar 2012 03:56:51 -0800

On 07/03/12 21:05, Andy Seaborne wrote:

On 06/03/12 21:22, Rob Vesse wrote:

If I might throw my 2 cents into the mix...


In dotNetRDF in the recent releases (2 weeks ago) we added the ability
to automatically have a dataset linked to a full text index and keep
that index in sync with changes in the dataset. My approach to this was
to use the decorator pattern, so what I have is a base decorator [1]
which is simply an implementation of our dataset interface which passes
through all calls to the underlying dataset. We then have a decorator
[2] which extends this base class and adds the logic to intercept the
calls that alter the dataset so that it updates the index as well as
passing the call through to the underlying dataset.

Since all updates go through the dataset interface this allows us to
catch all updates and keep the full text index up to date. Whether this
is applicable to Jena or not depends on whether all updates go through a
single dataset interface in Jena which is a part of the code base I am
not so familiar with?

Rob

[1]
http://dotnetrdf.svn.sourceforge.net/viewvc/dotnetrdf/Trunk/Libraries/core/Query/Datasets/WrapperDataset.cs?revision=2157&view=markup


[2]
http://dotnetrdf.svn.sourceforge.net/viewvc/dotnetrdf/Trunk/Libraries/query.fulltext/Datasets/FullTextIndexedDataset.cs?revision=2157&view=markup


Other than the fact it's called *Wrapper, the decorator pattern is used
in ARQ and TDB in various places.

TDB used to support graphs without datasets, so it's a bit more mixed
than just catching DatasetGraph. But that's history.

We could change GraphTDBBase to use DatasetGraph and so everything goes
via DatasetGraph .... or even junk GraphTDB altogether and have
standardised graphs-over-datasetgraphs.

SPARQL Query bypasses all this as does bulkloading.

SPARQL Update does use DatasetGraph.

see DatasetGraphWrapper

Eclipse practically writes such classes if you implement an interface
and use "quick fix".

Andy


OK - it's not that simple :-)

GraphTDB also provides access to internal structures for the queryengine etc. It avoids the need keep casting to get from "graph" totuple tables that would be need if it were generic and gave access tothe dataset. I avoid excessive casting designs because of the chance anon-thing is passed in.

So GraphTDB sits along side DatasetGraphTDB and both need wrappers atthe moment, if you want to trap API update calls.


        Andy

Re: indexing [was Re: Performance of successive identical queries]

Reply via email to