[jira] [Commented] (JENA-1250) Upgrade text search to latest Lucene

Osma Suominen (JIRA) Sat, 29 Oct 2016 03:08:19 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15617866#comment-15617866
 ]


Osma Suominen commented on JENA-1250:
-------------------------------------

Right. The old code relies on having an updateDocument() method that takes an 
Analyzer parameter, but that was removed in Lucene 5, see 
[LUCENE-6212|https://issues.apache.org/jira/browse/LUCENE-6212] for the 
details. That JIRA ticket has some discussion about the alternatives.

The official advice (from the [Lucene 5.5.3 migration 
guide|https://lucene.apache.org/core/5_5_3/MIGRATE.html]) is: "Instead, you 
should break out text into separate fields and use a different analyzer for 
each field with PerFieldAnalyzerWrapper."

It would be possible to do that here, but then I think it would be a bit 
difficult to support both the language-specific and non-language-specific 
searches that TextIndexLuceneMultilingual currently supports. One option would 
be to index using both language-specific (label_en) and non-language-specific 
(label) fields in parallel, and then choose the target field based on whether 
the query specifies the language or not.

Apparently there is another solution based on using language-specific 
TokenStreams, hinted at in the LUCENE-6212 discussion, or possibly a third one 
based on mutable Analyzers, but I'm not sure whether it's a good idea to take 
the code in this direction. Since the Lucene 5 upgrade will break index 
compatibility anyway, changing the multilingual analyer implementation to use a 
different field layout shouldn't matter.

I can give a shot at changing the multilingual index implementation to use 
language-specific fields, but it may take a few days. Meanwhile, could you 
please clean up the comments you've left behind when updating the code to match 
the API changes? I see quite a few "// jmv" comments that shouldn't be part of 
the final patch to merge. And the same for the commented out versions in 
pom.xml. Git history provides enough information about who has changed the code 
and why, there is no need to keep obsolete code in comments.

> Upgrade text search to latest Lucene
> ------------------------------------
>
>                 Key: JENA-1250
>                 URL: https://issues.apache.org/jira/browse/JENA-1250
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: Jena
>            Reporter: Jean-Marc Vanel
>
> We are currently at Lucene 4.9.1 ,
> which is quite outdated compared to latest Lucene, which is 6.2.1 .
> Note that there is project to add a simple completion feature in addition to 
> existing simple search.
> But it would be better to do that on an updated Lucene dependency .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (JENA-1250) Upgrade text search to latest Lucene

Reply via email to