[jira] [Commented] (JENA-776) LowerCaseKeywordAnalyzer for jena-text

ASF GitHub Bot (JIRA) Wed, 29 Apr 2015 03:14:04 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518877#comment-14518877
 ]


ASF GitHub Bot commented on JENA-776:
-------------------------------------

Github user osma commented on the pull request:

    https://github.com/apache/jena/pull/52#issuecomment-97329735
  
    > 1. I'm not familiar with assembler configuration. But if you want to give 
some help ;-)
    
    I'll try. I've done two jena-text patches in the past, and in both cases I 
added support for assembler configuration.
    
    For your patch I think it would be useful to be able to enable/disable 
multilingual indexing in a particular jena-text index (default should be to 
disable it, for backwards compatibility). Adjusting the particular 
language-specific indexers, as I originally suggested, is not very important at 
this point.
    
    Thinking about assembler configuration, I think it would be easiest to plug 
this in as an alternative to the current Analyzer variants (StandardAnalyzer, 
SimpleAnalyzer, KeywordAnalyzer, LowerCaseKeywordAnalyzer). You can look at my 
patch in [JENA-776](https://issues.apache.org/jira/browse/JENA-776) that added 
the LowerCaseKeywordAnalyzer variant. Basically you need to create a new class 
such as MultilingualAnalyzerAssembler (similar to the other *AnalyzerAssembler 
classes) and plug support for it into TextAssembler. It's shouldn't be very 
difficult...
    
    > 2. Ok, I will refactor it to leave previous signatures and calls.
    > 3. Sure, it's more clean to extend Entity... ok, todo list. 
    
    Excellent!
    
    > For the tests and doc, I 'm pretty busy at the moment.
    
    I can't speak for Jena officially as I'm just an occasional contributor 
with an interest in jena-text, but Jena has very good unit test coverage and I 
think unit tests are expected from new contributions as well. If you won't 
write unit tests for this, I bet nobody else will... Again it's not very hard, 
you can look at my LowerCaseKeywordAnalyzer patch for an example.
    
    Regarding documentation, I think that what's needed is to update the main 
jena-text document, particularly the [Configuring an 
Analyzer](https://jena.apache.org/documentation/query/text-query.html#configuring-an-analyzer)
 section. I'm not 100% sure how it is technically maintained these days, but it 
used to be maintained via the CMS that [you can 
use](http://www.apache.org/dev/cmsref#non-committer) to provide a documentation 
patch. But I think it should be fine also to just provide an update as a 
comment here on GitHub. Again see JENA-776 for an example, there I just wrote 
up the small change to the documentation as a comment and @afs picked it up 
from there.



> LowerCaseKeywordAnalyzer for jena-text
> --------------------------------------
>
>                 Key: JENA-776
>                 URL: https://issues.apache.org/jira/browse/JENA-776
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: Text
>            Reporter: Osma Suominen
>            Assignee: Andy Seaborne
>             Fix For: Jena 2.12.1
>
>         Attachments: jena-text-lowercase-keyword-analyzer.patch
>
>
> I liked the option to specify Analyzer for jena-text, as implemented in 
> JENA-654. But I'd like to use an analyzer that is otherwise like 
> KeywordAnalyzer but case-insensitive, for use in an autocomplete/typeahead UI 
> widget. Lucene doesn't include such an analyzer, but there are several 
> implementations of the same idea, e.g. in neo4j [1] and stargate [2].
> I created my own implementation of such an analyzer and added code to use it 
> from the assembler. Patch attached.
> This analyzer is now in a new package org.apache.jena.query.text.analyzer, in 
> case other analyzers for jena-text will appear in the future. If you don't 
> like the new package, the class can of course be moved to 
> org.apache.jena.query.text.
> I also added a test for case-insensitivity. To avoid lots of duplicate 
> boilerplate code, I slightly modified and subclassed the existing test for 
> KeywordAnalyzer.
> I'd love to see this in the next version of jena-text and Fuseki. Of course 
> I'll rework the patch if necessary. I can also tweak the web documentation to 
> mention this analyzer.
> -Osma
> [1] 
> https://github.com/apatry/neo4j-lucene4-index/blob/master/src/main/java/org/neo4j/index/impl/lucene/LowerCaseKeywordAnalyzer.java
> [2] 
> https://github.com/tuplejump/stargate-core/blob/master/src/main/java/com/tuplejump/stargate/lucene/CaseInsensitiveKeywordAnalyzer.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (JENA-776) LowerCaseKeywordAnalyzer for jena-text

Reply via email to