[ 
https://issues.apache.org/jira/browse/LUCENE-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13597026#comment-13597026
 ] 

Varun Thacker commented on LUCENE-4817:
---------------------------------------

Really useful token filter. 

You've mentioned that a user should use this with a 
RemoveDuplicatesTokenFilter, which is needed because if words don't get stemmed 
there would be duplicates in the same position.

So in the Javadocs for KeywordRepeatFilterFactory.java should use 
RemoveDuplicatesTokenFilter in the example. 

{code:xml} 
/**
 * Factory for {@link KeywordRepeatFilter}.
 * <pre class="prettyprint" >
 * &lt;fieldType name="text_keyword" class="solr.TextField" 
positionIncrementGap="100"&gt;
 *   &lt;analyzer&gt;
 *     &lt;tokenizer class="solr.WhitespaceTokenizerFactory"/&gt;
 *     &lt;filter class="solr.KeywordRepeatFilter"/&gt;
 *     &lt;filter class="solr.PorterStemFilterFactory"/&gt;
 *     &lt;filter class="solr.RemoveDuplicatesTokenFilterFactory"/&gt;
 *   &lt;/analyzer&gt;
 * &lt;/fieldType&gt;</pre>
 */
{code} 
                
> Add KeywordRepeaterFilter to emit tokens twice once as keyword and once not 
> as keyword
> --------------------------------------------------------------------------------------
>
>                 Key: LUCENE-4817
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4817
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 4.1
>            Reporter: Simon Willnauer
>            Priority: Minor
>             Fix For: 5.0, 4.3
>
>         Attachments: LUCENE-4817.patch, LUCENE-4817.patch
>
>
> if you want to have a stemmed and an unstemmed version of a token one for 
> recall and one for precision you have to do two fields today in most of the 
> cases. Yet, most of the stemmers respect the keyword attribute so we could 
> add a token filter that emits the same token twice once as keyword and once 
> plain. Folks would most likely need to combine this 
> RemoveDuplicatesTokenFilter but that way we can have stemmed and unstemmed 
> version in the same field.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to