[jira] [Commented] (SOLR-2800) RemoveDuplicatesTokenFilterFactory can not remove the duplicated term

Steven Rowe (Commented) (JIRA) Thu, 29 Sep 2011 07:15:11 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117312#comment-13117312
 ]


Steven Rowe commented on SOLR-2800:
-----------------------------------

Can you provide a test case?  Your version seems more efficient, since 
duplicates are not placed in the CharArraySet, but the functionality looks the 
same to me as the original.  Your test case should demonstrate what is failing 
without your changes, and that your changes fix the problem.

Please provide code/tests in the form of a patch - it's much easier to see what 
you have changed/added.  Also, it's very important when you provide code that 
you attach a patch to the issue and click where it says "Grant license to ASF 
for inclusion in ASF works (as per the Apache License §5)" - unless you do 
this, we can't use your code.


                
> RemoveDuplicatesTokenFilterFactory can not remove the duplicated term
> ---------------------------------------------------------------------
>
>                 Key: SOLR-2800
>                 URL: https://issues.apache.org/jira/browse/SOLR-2800
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.4
>         Environment: Windows
>            Reporter: Han Hui Wen 
>              Labels: RemoveDuplicatesTokenFilterFactory, Solr
>             Fix For: 3.5
>
>
> Using RemoveDuplicatesTokenFilterFactory can not remove the duplicated term.
> in 
> http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_3_4/solr/core/src/java/org/apache/solr/analysis/RemoveDuplicatesTokenFilter.java?view=markup
> @Override
> 53    public boolean incrementToken() throws IOException {
> 54    while (input.incrementToken()) {
> 55    final char term[] = termAttribute.buffer();
> 56    final int length = termAttribute.length();
> 57    final int posIncrement = posIncAttribute.getPositionIncrement();
> 58    
> 59    if (posIncrement > 0) {
> 60    previous.clear();
> 61    }
> 62    
> 63    boolean duplicate = (posIncrement == 0 && previous.contains(term, 0, 
> length));
> 64    
> 65    // clone the term, and add to the set of seen terms.
> 66    char saved[] = new char[length];
> 67    System.arraycopy(term, 0, saved, 0, length);
> 68    previous.add(saved);
> 69    
> 70    if (!duplicate) {
> 71    return true;
> 72    }
> 73    }
> 74    return false;
> 75    }
> it should be like following:
> @Override
> public boolean incrementToken() throws IOException {
>       while (input.incrementToken()) {
>               final char term[] = termAttribute.buffer();
>               final int length = termAttribute.length();
>               final int posIncrement = posIncAttribute.getPositionIncrement();
>               if (posIncrement > 0) {
>                       previous.clear();
>               }
>               boolean duplicate = (posIncrement == 0 && 
> previous.contains(term, 0, length));
>                
>               if(duplicate )
>               {
>                 return false;
>               }
>               else
>               {
>                       // clone the term, and add to the set of seen terms.
>                       char saved[] = new char[length];
>                       System.arraycopy(term, 0, saved, 0, length);
>                       previous.add(saved);
>               }
>       }
>       return true;
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-2800) RemoveDuplicatesTokenFilterFactory can not remove the duplicated term

Reply via email to