Osma Suominen created JENA-776:
----------------------------------
Summary: LowerCaseKeywordAnalyzer for jena-text
Key: JENA-776
URL: https://issues.apache.org/jira/browse/JENA-776
Project: Apache Jena
Issue Type: Improvement
Components: Text
Reporter: Osma Suominen
I liked the option to specify Analyzer for jena-text, as implemented in
JENA-654. But I'd like to use an analyzer that is otherwise like
KeywordAnalyzer but case-insensitive, for use in an autocomplete/typeahead UI
widget. Lucene doesn't include such an analyzer, but there are several
implementations of the same idea, e.g. in neo4j [1] and stargate [2].
I created my own implementation of such an analyzer and added code to use it
from the assembler. Patch attached.
This analyzer is now in a new package org.apache.jena.query.text.analyzer, in
case other analyzers for jena-text will appear in the future. If you don't like
the new package, the class can of course be moved to org.apache.jena.query.text.
I also added a test for case-insensitivity. To avoid lots of duplicate
boilerplate code, I slightly modified and subclassed the existing test for
KeywordAnalyzer.
I'd love to see this in the next version of jena-text and Fuseki. Of course
I'll rework the patch if necessary. I can also tweak the web documentation to
mention this analyzer.
-Osma
[1]
https://github.com/apatry/neo4j-lucene4-index/blob/master/src/main/java/org/neo4j/index/impl/lucene/LowerCaseKeywordAnalyzer.java
[2]
https://github.com/tuplejump/stargate-core/blob/master/src/main/java/com/tuplejump/stargate/lucene/CaseInsensitiveKeywordAnalyzer.java
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)