Hi all,
I would like to propose a new feature for jena-text, making it possible
to store the original literals in the Lucene index for fast retrieval.
I've talked about this before, but at that point it was difficult to
implement. With the recent jena-text work by Alexis Miara and myself, I
think this would now be feasible to implement with relatively little effort.
It would work like this:
1. Configure jena-text to store literals (default would be off):
<#entMap> a text:EntityMap ;
text:entityField "uri" ;
text:langField "lang" ;
text:storeValues true ;
[...]
2. Add some data, say this triple:
:myresource rdfs:label "My resource"@en .
3. Query like this:
SELECT * {
(?s ?score ?literal) text:query "resource" .
}
In the query result, ?literal would be bound to "My resource"@en.
In practice, the literal value would be stored using the Lucene facility
to store the original field value alongside the indexed value
(TextField.TYPE_STORED). This would be similar to how LARQ worked. If
the langField setting was in use, the language field would hold the
language tag as well. If not, the returned literals would not have a
language tag (in the above example, the value would be "My resource").
The benefit would be that there would be no need to hunt for the
original matching value in the RDF data. This would simplify, and
probably speed up, many of the SPARQL queries that I use in the Skosmos
application.
I already have some preliminary code and tests to implement this, but
they are not yet ready for public review. I can make a pull request
later on when I have something to show.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi