Re: Storing values in Lucene index

Osma Suominen Fri, 27 Feb 2015 09:14:44 -0800

27.02.2015, 18:06, Andy Seaborne wrote:

This is inefficient if there happen to be lots of skos:altLabel values,
as there are in e.g. AGROVOC thesaurus data.


How many skos:altLabel can occur in that dataset?

As an extreme example, <http://aims.fao.org/aos/agrovoc/c_1548> (thecountry Chile) has 433 altLabels. The typical case (if there's such athing - it's probably a long tail distribution) is more like a dozen perconcept. AGROVOC has terms in over 20 languages. Queries involving theliterals tend to be a bit slow...

jena-text is a bit misnamed.  It's an entity index : "find subjects such
that ..."  Entity indexes make the conjunctive use cases work, "find
entities such that :property1 matches ... and :property2 matches ...".

The example above is closer to a text index (query -> literal) LARQ
could do both in different configurations (not at the same time) through
people tended to use it as a text index and then look in the RDF to make
it an entity index.  It can't in a single call do the conjunctive use
case nor be particularly easy to manage specific properties in different
ways.

I have come to realise that we might provide both kinds of index
separately.  A tightly managed literal-text-index could have deeper
integration into query processing e.g. FILTER expressions.

I don't oppose, but I don't really follow either. Is there somethingfundamentally wrong with the (?s ?value) text:query 'blah' query stylethat I suggested? It's not like its unusual to store the actual valuesin a Lucene index... Lucene supports it (and Solr too), LARQ does it,many people do it. I understand that not all people will need it (andthe associated size/performance costs), but it could be made optional.


-Osma

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Teollisuuskatu 23)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Re: Storing values in Lucene index

Reply via email to