[ https://issues.apache.org/jira/browse/SOLR-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117321#comment-13117321 ]
Steven Rowe commented on SOLR-2800: ----------------------------------- I agree the set is needed, but the current code clones and then adds every single term to the set, regardless of whether it's already in the set. > RemoveDuplicatesTokenFilterFactory can not remove the duplicated term > --------------------------------------------------------------------- > > Key: SOLR-2800 > URL: https://issues.apache.org/jira/browse/SOLR-2800 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis > Affects Versions: 3.4 > Environment: Windows > Reporter: Han Hui Wen > Labels: RemoveDuplicatesTokenFilterFactory, Solr > Fix For: 3.5 > > > Using RemoveDuplicatesTokenFilterFactory can not remove the duplicated term. > in > http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_3_4/solr/core/src/java/org/apache/solr/analysis/RemoveDuplicatesTokenFilter.java?view=markup > @Override > 53 public boolean incrementToken() throws IOException { > 54 while (input.incrementToken()) { > 55 final char term[] = termAttribute.buffer(); > 56 final int length = termAttribute.length(); > 57 final int posIncrement = posIncAttribute.getPositionIncrement(); > 58 > 59 if (posIncrement > 0) { > 60 previous.clear(); > 61 } > 62 > 63 boolean duplicate = (posIncrement == 0 && previous.contains(term, 0, > length)); > 64 > 65 // clone the term, and add to the set of seen terms. > 66 char saved[] = new char[length]; > 67 System.arraycopy(term, 0, saved, 0, length); > 68 previous.add(saved); > 69 > 70 if (!duplicate) { > 71 return true; > 72 } > 73 } > 74 return false; > 75 } > it should be like following: > @Override > public boolean incrementToken() throws IOException { > while (input.incrementToken()) { > final char term[] = termAttribute.buffer(); > final int length = termAttribute.length(); > final int posIncrement = posIncAttribute.getPositionIncrement(); > if (posIncrement > 0) { > previous.clear(); > } > boolean duplicate = (posIncrement == 0 && > previous.contains(term, 0, length)); > > if(duplicate ) > { > return false; > } > else > { > // clone the term, and add to the set of seen terms. > char saved[] = new char[length]; > System.arraycopy(term, 0, saved, 0, length); > previous.add(saved); > } > } > return true; > } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org