[
https://issues.apache.org/jira/browse/JENA-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518877#comment-14518877
]
ASF GitHub Bot commented on JENA-776:
-------------------------------------
Github user osma commented on the pull request:
https://github.com/apache/jena/pull/52#issuecomment-97329735
> 1. I'm not familiar with assembler configuration. But if you want to give
some help ;-)
I'll try. I've done two jena-text patches in the past, and in both cases I
added support for assembler configuration.
For your patch I think it would be useful to be able to enable/disable
multilingual indexing in a particular jena-text index (default should be to
disable it, for backwards compatibility). Adjusting the particular
language-specific indexers, as I originally suggested, is not very important at
this point.
Thinking about assembler configuration, I think it would be easiest to plug
this in as an alternative to the current Analyzer variants (StandardAnalyzer,
SimpleAnalyzer, KeywordAnalyzer, LowerCaseKeywordAnalyzer). You can look at my
patch in [JENA-776](https://issues.apache.org/jira/browse/JENA-776) that added
the LowerCaseKeywordAnalyzer variant. Basically you need to create a new class
such as MultilingualAnalyzerAssembler (similar to the other *AnalyzerAssembler
classes) and plug support for it into TextAssembler. It's shouldn't be very
difficult...
> 2. Ok, I will refactor it to leave previous signatures and calls.
> 3. Sure, it's more clean to extend Entity... ok, todo list.
Excellent!
> For the tests and doc, I 'm pretty busy at the moment.
I can't speak for Jena officially as I'm just an occasional contributor
with an interest in jena-text, but Jena has very good unit test coverage and I
think unit tests are expected from new contributions as well. If you won't
write unit tests for this, I bet nobody else will... Again it's not very hard,
you can look at my LowerCaseKeywordAnalyzer patch for an example.
Regarding documentation, I think that what's needed is to update the main
jena-text document, particularly the [Configuring an
Analyzer](https://jena.apache.org/documentation/query/text-query.html#configuring-an-analyzer)
section. I'm not 100% sure how it is technically maintained these days, but it
used to be maintained via the CMS that [you can
use](http://www.apache.org/dev/cmsref#non-committer) to provide a documentation
patch. But I think it should be fine also to just provide an update as a
comment here on GitHub. Again see JENA-776 for an example, there I just wrote
up the small change to the documentation as a comment and @afs picked it up
from there.
> LowerCaseKeywordAnalyzer for jena-text
> --------------------------------------
>
> Key: JENA-776
> URL: https://issues.apache.org/jira/browse/JENA-776
> Project: Apache Jena
> Issue Type: Improvement
> Components: Text
> Reporter: Osma Suominen
> Assignee: Andy Seaborne
> Fix For: Jena 2.12.1
>
> Attachments: jena-text-lowercase-keyword-analyzer.patch
>
>
> I liked the option to specify Analyzer for jena-text, as implemented in
> JENA-654. But I'd like to use an analyzer that is otherwise like
> KeywordAnalyzer but case-insensitive, for use in an autocomplete/typeahead UI
> widget. Lucene doesn't include such an analyzer, but there are several
> implementations of the same idea, e.g. in neo4j [1] and stargate [2].
> I created my own implementation of such an analyzer and added code to use it
> from the assembler. Patch attached.
> This analyzer is now in a new package org.apache.jena.query.text.analyzer, in
> case other analyzers for jena-text will appear in the future. If you don't
> like the new package, the class can of course be moved to
> org.apache.jena.query.text.
> I also added a test for case-insensitivity. To avoid lots of duplicate
> boilerplate code, I slightly modified and subclassed the existing test for
> KeywordAnalyzer.
> I'd love to see this in the next version of jena-text and Fuseki. Of course
> I'll rework the patch if necessary. I can also tweak the web documentation to
> mention this analyzer.
> -Osma
> [1]
> https://github.com/apatry/neo4j-lucene4-index/blob/master/src/main/java/org/neo4j/index/impl/lucene/LowerCaseKeywordAnalyzer.java
> [2]
> https://github.com/tuplejump/stargate-core/blob/master/src/main/java/com/tuplejump/stargate/lucene/CaseInsensitiveKeywordAnalyzer.java
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)