[ 
https://issues.apache.org/jira/browse/JENA-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16410233#comment-16410233
 ] 

Osma Suominen commented on JENA-1506:
-------------------------------------

Great work, thanks [~code-ferret]!

> Add configurable filters and tokenizers
> ---------------------------------------
>
>                 Key: JENA-1506
>                 URL: https://issues.apache.org/jira/browse/JENA-1506
>             Project: Apache Jena
>          Issue Type: New Feature
>          Components: Text
>    Affects Versions: Jena 3.7.0
>            Reporter: Code Ferret
>            Assignee: Code Ferret
>            Priority: Major
>             Fix For: Jena 3.7.0
>
>
> In support of [Jena-1488|https://issues.apache.org/jira/browse/JENA-1488], 
> this issue proposes to add a feature to allow including defined filters and 
> tokenizers, similar to {{DefinedAnalyzer}}, for the {{ConfigurableAnalyzer}}, 
> allowing configurable arguments such as the {{excludeChars}}. I've looked at 
> {{ConfigurableAnalyzer}} and its assembler and it should be straightforward.
> I would add tokenizer and filter definitions to {{TextIndexLucene}} similar 
> to the support for adding analyzers:
> {code:java}
>     text:defineFilters (
>         [ text:defineFilter <#foo> ; 
>           text:filter [ 
>             a text:GenericFilter ;
>             text:class "fi.finto.FoldingFilter" ;
>             text:params (
>                 [ text:paramName "excludeChars" ;
>                   text:paramType text:TypeString ; 
>                   text:paramValue "whatevercharstoexclude" ]
>                 )
>             ] ; 
>           ]
>       )
> {code}
> {{GenericFilterAssembler}} and {{GenericTokenizerAssmbler}} would make use of 
> much of the code in {{GenericAnalyzerAssembler}}. The changes to 
> {{ConfigurableAnalyzer}} and {{ConfigurableAnalyzerAssembler}} are 
> straightforward and mostly involve retaining the resource URI rather than 
> extracting the localName.
> Such an addition will make it easy to create new tokenizers and filters that 
> could be dropped in by just adding the classes onto the jena/fuseki classpath 
> or by referring to ones already included in Jena (via Lucene or otherwise) 
> and putting the appropriate assembler bits in the configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to