[
https://issues.apache.org/jira/browse/LUCENE-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Stojanovic updated LUCENE-3838:
------------------------------------
Attachment: TempTest.java
Sorry for the test added in description not being formatted correctly.
TempTest.java is attached also.
> IndexWriter.maybeMerge() removes deleted documents from index (Lucene 3.1.0
> to 3.5.0)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3838
> URL: https://issues.apache.org/jira/browse/LUCENE-3838
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/index
> Affects Versions: 3.1, 3.2, 3.3, 3.4, 3.5
> Environment: Windows, Linux, OSX
> Reporter: Ivan Stojanovic
> Priority: Blocker
> Labels: api-change
> Attachments: TempTest.java
>
>
> My company uses Lucene for high performance, heavy loaded farms of
> translation repositories with hundreds of simultaneous
> add/delete/update/search/retrieve threads. In order to support this complex
> architecture beside other things and tricks used here I rely on docId-s being
> unchanged until I ask that explicitly (using IndexWriter.optimize() -
> IndexWriter.forceMerge()).
> For this behavior LogMergePolicy is used.
> This worked fine until we raised the Lucene version from 3.0.2 to 3.5.0.
> Until version 3.1.0 merge triggerred by IndexWriter.addDocument() didn't
> expunge deleted documents ensuring that docId-s stayed unchanged and making
> some critical jobs possible without impact on index size.
> IndexWriter.optimize() did the actual deleted documents removal.
> From Lucene version 3.1.0 IndexWriter.maybeMerge() does the same thing as
> IndexWriter.forceMerge() regarding deleted documents. There is no difference.
> This leads to unpredictable internal index structure changes during simple
> document add (and possible delete) operations and in undefined point in time.
> I looked into the Lucene source code and can definitely confirm this.
> This issue makes our Lucene client code totally unusable.
> Solution steps:
> 1) add a flag somewhere that will control whether the deleted documents
> should be removed in maybeMerge(). Note that this is only a half of what we
> need here.
> 2) make forceMerge() always remove deleted documents no matter if
> maybeMerge() removes them or not. Alternatively, there can be another
> parameter added to forceMerge() that will also tell if deleted documents
> should be removed from index or not.
> The sample JUnit code that can replicate this issue is added below.
> public class TempTest {
> private Analyzer _analyzer = new KeywordAnalyzer();
> @Test
> public void testIndex() throws Exception {
> File indexDir = new File("sample-index");
> if (indexDir.exists()) {
> indexDir.delete();
> }
> FSDirectory index = FSDirectory.open(indexDir);
> Document doc;
> IndexWriter writer = createWriter(index, true);
> try {
> doc = new Document();
> doc.add(new Field("field", "text0", Field.Store.YES,
> Field.Index.ANALYZED));
> writer.addDocument(doc);
> doc = new Document();
> doc.add(new Field("field", "text1", Field.Store.YES,
> Field.Index.ANALYZED));
> writer.addDocument(doc);
> doc = new Document();
> doc.add(new Field("field", "text2", Field.Store.YES,
> Field.Index.ANALYZED));
> writer.addDocument(doc);
> writer.commit();
> } finally {
> writer.close();
> }
> IndexReader reader = IndexReader.open(index, false);
> try {
> reader.deleteDocument(1);
> } finally {
> reader.close();
> }
> writer = createWriter(index, false);
> try {
> for (int i = 3; i < 100; i++) {
> doc = new Document();
> doc.add(new Field("field", "text" + i, Field.Store.YES,
> Field.Index.ANALYZED));
> writer.addDocument(doc);
> writer.commit();
> }
> } finally {
> writer.close();
> }
> boolean deleted;
> String text;
> reader = IndexReader.open(index, true);
> try {
> deleted = reader.isDeleted(1);
> text = reader.document(1).get("field");
> } finally {
> reader.close();
> }
> assertTrue(deleted); // This line breaks
> assertEquals("text1", text);
> }
> private MergePolicy createEngineMergePolicy() {
> LogDocMergePolicy mergePolicy = new LogDocMergePolicy();
> mergePolicy.setCalibrateSizeByDeletes(false);
> mergePolicy.setUseCompoundFile(true);
> mergePolicy.setNoCFSRatio(1.0);
> return mergePolicy;
> }
> private IndexWriter createWriter(Directory index, boolean create)
> throws Exception {
> IndexWriterConfig iwConfig = new IndexWriterConfig(Version.LUCENE_35,
> _analyzer);
> iwConfig.setOpenMode(create ? IndexWriterConfig.OpenMode.CREATE
> : IndexWriterConfig.OpenMode.APPEND);
> iwConfig.setMergePolicy(createEngineMergePolicy());
> iwConfig.setMergeScheduler(new ConcurrentMergeScheduler());
> return new IndexWriter(index, iwConfig);
> }
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]