[jira] [Commented] (SOLR-2800) RemoveDuplicatesTokenFilterFactory can not remove the duplicated term

Steven Rowe (Commented) (JIRA) Thu, 29 Sep 2011 07:35:10 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117321#comment-13117321
 ]


Steven Rowe commented on SOLR-2800:
-----------------------------------

I agree the set is needed, but the current code clones and then adds every 
single term to the set, regardless of whether it's already in the set.
                
> RemoveDuplicatesTokenFilterFactory can not remove the duplicated term
> ---------------------------------------------------------------------
>
>                 Key: SOLR-2800
>                 URL: https://issues.apache.org/jira/browse/SOLR-2800
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.4
>         Environment: Windows
>            Reporter: Han Hui Wen 
>              Labels: RemoveDuplicatesTokenFilterFactory, Solr
>             Fix For: 3.5
>
>
> Using RemoveDuplicatesTokenFilterFactory can not remove the duplicated term.
> in 
> http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_3_4/solr/core/src/java/org/apache/solr/analysis/RemoveDuplicatesTokenFilter.java?view=markup
> @Override
> 53    public boolean incrementToken() throws IOException {
> 54    while (input.incrementToken()) {
> 55    final char term[] = termAttribute.buffer();
> 56    final int length = termAttribute.length();
> 57    final int posIncrement = posIncAttribute.getPositionIncrement();
> 58    
> 59    if (posIncrement > 0) {
> 60    previous.clear();
> 61    }
> 62    
> 63    boolean duplicate = (posIncrement == 0 && previous.contains(term, 0, 
> length));
> 64    
> 65    // clone the term, and add to the set of seen terms.
> 66    char saved[] = new char[length];
> 67    System.arraycopy(term, 0, saved, 0, length);
> 68    previous.add(saved);
> 69    
> 70    if (!duplicate) {
> 71    return true;
> 72    }
> 73    }
> 74    return false;
> 75    }
> it should be like following:
> @Override
> public boolean incrementToken() throws IOException {
>       while (input.incrementToken()) {
>               final char term[] = termAttribute.buffer();
>               final int length = termAttribute.length();
>               final int posIncrement = posIncAttribute.getPositionIncrement();
>               if (posIncrement > 0) {
>                       previous.clear();
>               }
>               boolean duplicate = (posIncrement == 0 && 
> previous.contains(term, 0, length));
>                
>               if(duplicate )
>               {
>                 return false;
>               }
>               else
>               {
>                       // clone the term, and add to the set of seen terms.
>                       char saved[] = new char[length];
>                       System.arraycopy(term, 0, saved, 0, length);
>                       previous.add(saved);
>               }
>       }
>       return true;
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2800) RemoveDuplicatesTokenFilterFactory can not remove the duplicated term

Reply via email to