On 24/06/15 13:00, Osma Suominen wrote:
Hi all,
I would like to propose a new feature for jena-text, making it possible to store
the original literals in the Lucene index for fast retrieval. I've talked about
this before, but at that point it was difficult to implement. With the recent
jena-text work by Alexis Miara and myself, I think this would now be feasible to
implement with relatively little effort.
Ooh, excellent.
I did some experiments with a hacked jena-text a while ago along similar lines
as proof-of-performance-concept; it would be nice to have something like
that in mainline jena.
In practice, the literal value would be stored using the Lucene facility to
store the original field value alongside the indexed value
(TextField.TYPE_STORED). This would be similar to how LARQ worked. If the
langField setting was in use, the language field would hold the language tag as
well. If not, the returned literals would not have a language tag (in the above
example, the value would be "My resource").
Typed literals should work as well.
I remember some gotchas where bits of the code believed that what came
out of the index could only be a non-blank resource, but it was fixable
and presumably you've already spotted that.
[Hmm, where /did/ I put that code?]
Chris
--
"You work with mad scientists and you're surprised at a talking /cat/?"
/Girl Genius/
Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)