[
https://issues.apache.org/jira/browse/JENA-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15617866#comment-15617866
]
Osma Suominen commented on JENA-1250:
-------------------------------------
Right. The old code relies on having an updateDocument() method that takes an
Analyzer parameter, but that was removed in Lucene 5, see
[LUCENE-6212|https://issues.apache.org/jira/browse/LUCENE-6212] for the
details. That JIRA ticket has some discussion about the alternatives.
The official advice (from the [Lucene 5.5.3 migration
guide|https://lucene.apache.org/core/5_5_3/MIGRATE.html]) is: "Instead, you
should break out text into separate fields and use a different analyzer for
each field with PerFieldAnalyzerWrapper."
It would be possible to do that here, but then I think it would be a bit
difficult to support both the language-specific and non-language-specific
searches that TextIndexLuceneMultilingual currently supports. One option would
be to index using both language-specific (label_en) and non-language-specific
(label) fields in parallel, and then choose the target field based on whether
the query specifies the language or not.
Apparently there is another solution based on using language-specific
TokenStreams, hinted at in the LUCENE-6212 discussion, or possibly a third one
based on mutable Analyzers, but I'm not sure whether it's a good idea to take
the code in this direction. Since the Lucene 5 upgrade will break index
compatibility anyway, changing the multilingual analyer implementation to use a
different field layout shouldn't matter.
I can give a shot at changing the multilingual index implementation to use
language-specific fields, but it may take a few days. Meanwhile, could you
please clean up the comments you've left behind when updating the code to match
the API changes? I see quite a few "// jmv" comments that shouldn't be part of
the final patch to merge. And the same for the commented out versions in
pom.xml. Git history provides enough information about who has changed the code
and why, there is no need to keep obsolete code in comments.
> Upgrade text search to latest Lucene
> ------------------------------------
>
> Key: JENA-1250
> URL: https://issues.apache.org/jira/browse/JENA-1250
> Project: Apache Jena
> Issue Type: Improvement
> Components: Jena
> Reporter: Jean-Marc Vanel
>
> We are currently at Lucene 4.9.1 ,
> which is quite outdated compared to latest Lucene, which is 6.2.1 .
> Note that there is project to add a simple completion feature in addition to
> existing simple search.
> But it would be better to do that on an updated Lucene dependency .
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)