Storing values in Lucene index Re: Release planning : 2.13.0

Osma Suominen Thu, 26 Feb 2015 23:05:52 -0800

On 26/02/15 18:37, Stephen Allen wrote:

I would propose in the future that we actual store and not
just index the document so that it can be appropriately identified and
deleted.  This would require a change to existing Lucene databases (we
should provide a tool to reindex existing data).  An alternative to
actually storing the value would be to generate a hash of the
subject+predicate+object and store that as an identifier.

I second storing the original value in the Lucene index at least as anoption - it would obviously increase the index size, though I suspectthe increase would be rather minor if you compare it to the overall (TDB+ text index) database size. This would be similar to how LARQ used towork, though LARQ only provides access to the values, not the subjectresources.

It would allow, with some additional code, having access to the actualvalue from the SPARQL query. Something like this:


(?s ?value) text:query 'word' .

Then you could also easily check that the triple actually exists incurrent RDF data (and in the current graph), with a pattern such as this:


?s rdfs:label ?value .

For me, it would probably allow some optimization of queries thatcurrently have to do a bit of detective work to find out which valueactually matched the query. I'm currently doing queries somewhat like this:


?s text:query (skos:altLabel 'word*') .
?s skos:altLabel ?value .
FILTER (STRSTARTS(?value, 'word'))

This is inefficient if there happen to be lots of skos:altLabel values,as there are in e.g. AGROVOC thesaurus data.


-Osma


--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Teollisuuskatu 23)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi

Storing values in Lucene index Re: Release planning : 2.13.0

Reply via email to