[ 
https://issues.apache.org/jira/browse/LUCENE-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Stojanovic updated LUCENE-3838:
------------------------------------

    Attachment: TempTest.java

Sorry for the test added in description not being formatted correctly. 
TempTest.java is attached also.
                
> IndexWriter.maybeMerge() removes deleted documents from index (Lucene 3.1.0 
> to 3.5.0)
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3838
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3838
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 3.1, 3.2, 3.3, 3.4, 3.5
>         Environment: Windows, Linux, OSX
>            Reporter: Ivan Stojanovic
>            Priority: Blocker
>              Labels: api-change
>         Attachments: TempTest.java
>
>
> My company uses Lucene for high performance, heavy loaded farms of 
> translation repositories with hundreds of simultaneous 
> add/delete/update/search/retrieve threads. In order to support this complex 
> architecture beside other things and tricks used here I rely on docId-s being 
> unchanged until I ask that explicitly (using IndexWriter.optimize() - 
> IndexWriter.forceMerge()).
> For this behavior LogMergePolicy is used.
> This worked fine until we raised the Lucene version from 3.0.2 to 3.5.0. 
> Until version 3.1.0 merge triggerred by IndexWriter.addDocument() didn't 
> expunge deleted documents ensuring that docId-s stayed unchanged and making 
> some critical jobs possible without impact on index size. 
> IndexWriter.optimize() did the actual deleted documents removal.
> From Lucene version 3.1.0 IndexWriter.maybeMerge() does the same thing as 
> IndexWriter.forceMerge() regarding deleted documents. There is no difference. 
> This leads to unpredictable internal index structure changes during simple 
> document add (and possible delete) operations and in undefined point in time. 
> I looked into the Lucene source code and can definitely confirm this.
> This issue makes our Lucene client code totally unusable.
> Solution steps:
> 1) add a flag somewhere that will control whether the deleted documents 
> should be removed in maybeMerge(). Note that this is only a half of what we 
> need here.
> 2) make forceMerge() always remove deleted documents no matter if 
> maybeMerge() removes them or not. Alternatively, there can be another 
> parameter added to forceMerge() that will also tell if deleted documents 
> should be removed from index or not.
> The sample JUnit code that can replicate this issue is added below.
> public class TempTest {
>     private Analyzer _analyzer = new KeywordAnalyzer();
>     @Test
>     public void testIndex() throws Exception {
>       File indexDir = new File("sample-index");
>       if (indexDir.exists()) {
>           indexDir.delete();
>       }
>       FSDirectory index = FSDirectory.open(indexDir);
>       Document doc;
>       IndexWriter writer = createWriter(index, true);
>       try {
>           doc = new Document();
>           doc.add(new Field("field", "text0", Field.Store.YES,
>                   Field.Index.ANALYZED));
>           writer.addDocument(doc);
>           doc = new Document();
>           doc.add(new Field("field", "text1", Field.Store.YES,
>                   Field.Index.ANALYZED));
>           writer.addDocument(doc);
>           doc = new Document();
>           doc.add(new Field("field", "text2", Field.Store.YES,
>                   Field.Index.ANALYZED));
>           writer.addDocument(doc);
>           writer.commit();
>       } finally {
>           writer.close();
>       }
>       IndexReader reader = IndexReader.open(index, false);
>       try {
>           reader.deleteDocument(1);
>       } finally {
>           reader.close();
>       }
>       writer = createWriter(index, false);
>       try {
>           for (int i = 3; i < 100; i++) {
>               doc = new Document();
>               doc.add(new Field("field", "text" + i, Field.Store.YES,
>                       Field.Index.ANALYZED));
>               writer.addDocument(doc);
>               writer.commit();
>           }
>       } finally {
>           writer.close();
>       }
>       boolean deleted;
>       String text;
>       reader = IndexReader.open(index, true);
>       try {
>           deleted = reader.isDeleted(1);
>           text = reader.document(1).get("field");
>       } finally {
>           reader.close();
>       }
>       assertTrue(deleted); // This line breaks
>       assertEquals("text1", text);
>     }
>     private MergePolicy createEngineMergePolicy() {
>       LogDocMergePolicy mergePolicy = new LogDocMergePolicy();
>       mergePolicy.setCalibrateSizeByDeletes(false);
>       mergePolicy.setUseCompoundFile(true);
>       mergePolicy.setNoCFSRatio(1.0);
>       return mergePolicy;
>     }
>     private IndexWriter createWriter(Directory index, boolean create)
>           throws Exception {
>       IndexWriterConfig iwConfig = new IndexWriterConfig(Version.LUCENE_35,
>               _analyzer);
>       iwConfig.setOpenMode(create ? IndexWriterConfig.OpenMode.CREATE
>               : IndexWriterConfig.OpenMode.APPEND);
>       iwConfig.setMergePolicy(createEngineMergePolicy());
>       iwConfig.setMergeScheduler(new ConcurrentMergeScheduler());
>       return new IndexWriter(index, iwConfig);
>     }
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to