27.02.2015, 18:06, Andy Seaborne wrote:
This is inefficient if there happen to be lots of skos:altLabel values,
as there are in e.g. AGROVOC thesaurus data.
How many skos:altLabel can occur in that dataset?
As an extreme example, <http://aims.fao.org/aos/agrovoc/c_1548> (the
country Chile) has 433 altLabels. The typical case (if there's such a
thing - it's probably a long tail distribution) is more like a dozen per
concept. AGROVOC has terms in over 20 languages. Queries involving the
literals tend to be a bit slow...
jena-text is a bit misnamed. It's an entity index : "find subjects such
that ..." Entity indexes make the conjunctive use cases work, "find
entities such that :property1 matches ... and :property2 matches ...".
The example above is closer to a text index (query -> literal) LARQ
could do both in different configurations (not at the same time) through
people tended to use it as a text index and then look in the RDF to make
it an entity index. It can't in a single call do the conjunctive use
case nor be particularly easy to manage specific properties in different
ways.
I have come to realise that we might provide both kinds of index
separately. A tightly managed literal-text-index could have deeper
integration into query processing e.g. FILTER expressions.
I don't oppose, but I don't really follow either. Is there something
fundamentally wrong with the (?s ?value) text:query 'blah' query style
that I suggested? It's not like its unusual to store the actual values
in a Lucene index... Lucene supports it (and Solr too), LARQ does it,
many people do it. I understand that not all people will need it (and
the associated size/performance costs), but it could be made optional.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Teollisuuskatu 23)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi