[ 
https://issues.apache.org/jira/browse/LUCENE-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658308#action_12658308
 ] 

Hoss Man commented on LUCENE-1491:
----------------------------------

patch looks good ... the one question i have is whether the fix meets user 
expectations: the patch as posted "skips" any input tokens that are shorter 
then the minimum ngram length ... is that what most people will expect, or will 
people expect shorter tokens to be passed through?

ie: should "min" be the minimum token size produced by the filters (a hard 
min), or should it be the minimum ngram size produced by the filter (a soft 
min)?

either way this patch is an improvement, i'm just wondering what we want to 
define the semantics to be (or if we want to make an additional option for this)

> EdgeNGramTokenFilter stops on tokens smaller then minimum gram size.
> --------------------------------------------------------------------
>
>                 Key: LUCENE-1491
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1491
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.4, 2.4.1, 2.9, 3.0
>            Reporter: Todd Feak
>         Attachments: LUCENE-1491.patch
>
>
> If a token is encountered in the stream that is shorter in length than the 
> min gram size, the filter will stop processing the token stream.
> Working up a unit test now, but may be a few days before I can provide it. 
> Wanted to get it in the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to