[
https://issues.apache.org/jira/browse/LUCENE-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874373#action_12874373
]
Trejkaz commented on LUCENE-2348:
---------------------------------
I attempted to make a test but it fails with matching 0 instead of matching 2
like I would have expected. Here is the code:
{code:java}
@Test
public void testDuplicateFilterAcrossSegments() throws Exception
{
RAMDirectory index1Dir = new RAMDirectory();
addDoc(index1Dir);
RAMDirectory index2Dir = new RAMDirectory();
addDoc(index2Dir);
IndexReader reader1 = IndexReader.open(index1Dir, true);
IndexReader reader2 = IndexReader.open(index2Dir, true);
IndexReader multi = new MultiReader(new IndexReader[] { reader1,
reader2 });
IndexSearcher searcher = new IndexSearcher(multi);
TopDocs docs;
docs = searcher.search(new MatchAllDocsQuery(), null, 10);
assertEquals("Should only be two hits without the filter (just
checking)", 2, docs.totalHits);
docs = searcher.search(new MatchAllDocsQuery(), new
DuplicateFilter("id"), 10);
assertEquals("Should only be one hit because the second was a
duplicate", 1, docs.totalHits);
}
private void addDoc(Directory dir) throws IOException
{
IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer(),
true, IndexWriter.MaxFieldLength.UNLIMITED);
try
{
Document doc = new Document();
doc.add(new Field("id", "1", Field.Store.YES, Field.Index.NO));
writer.addDocument(doc);
writer.commit();
}
finally
{
writer.close();
}
}
{code}
> DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment
> readers
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-2348
> URL: https://issues.apache.org/jira/browse/LUCENE-2348
> Project: Lucene - Java
> Issue Type: Bug
> Components: contrib/*
> Affects Versions: 2.9.2
> Reporter: Trejkaz
>
> DuplicateFilter currently works by building a single doc ID set, without
> taking into account that getDocIdSet() will be called once per segment and
> only with each segment's local reader.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]