Hi, I just downloaded the nightly build zip and am missing these patches (like a a lot of other Lucene users, probably). I am not using the CVS, so how can I apply the file handle changes?
Regards, Karsten -----Urspr�ngliche Nachricht----- Von: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED] Gesendet: Dienstag, 23. September 2003 01:16 An: Lucene Developers List Betreff: Re: file handle changes Greetings again. I've implemented the file handle reduction changes, roughly as proposed before. Here are the patches for your enjoyment! :) ------------------------------------------ SUMMARY: The goal of this patch is to drastically reduce the number of file handles required by Lucene. This is achieved by reducing the number of files required by a single index segment from N to 1, where N depends on the number of indexed fields in the segment. Typically, one should see a drop in the number of file handles by an order of magnitude! It could even be greater for indexes that contain large numbers of indexed fields. The best part is that to take advantage of this feature, one simply needs to call setUseCompoundFiles(true) on an IndexWriter before putting documents into it. Everything else is automatic! ------------------------------------------ DETAILS: The proposed implementation adds a new property to the IndexWriter -- get/setUseCompoundFiles(boolean). This property defaults to false, which is the existing behavior prior to this patch. If the property is set to true, all segments created by this IndexWriter will be of the "compound file" format. Compound file segments have only one main file - <id>.cfs. Document deletions are handled as before -- if documents from this segment are deleted, a second file named <id>.del is created (I didn't change this code). The get/setUseCompoundFiles setting can be changed at any time during the existance of the IndexWriter and takes effect during the next time the IndexWriter merges segments in its target directory. SegmentIndexReader can now work with either type of segment. This change does not affect how the segments are handled in the temporary RAMDirectory used by the IndexWriter internally, only the final segments written to the target directory. Also, a given directory can contain both types of segments and everything works out automagically. ----------------------------------------- I have also created a new JUnit test case to test these features, which runs successfully. For the moment it creates files off of the current working directory in which the junit is executed. I also converted some of the older tests "XXXTest" into "TestXXX", and made sure they work with the old implementation and the new one. These tests do not yet do enough assert(...) calls, but they now execute twice: with the multi-file indexes and the new compound file indexes, and assert that the output is the same. The old files are still there, I just added new ones with the inverted names. In one case - ThreadSafetyTest.java - I actually made changes to that file because I thougt this test was too long to run as an automatic test in JUnit. Build.xml required a small change to add a class from the src/demo tree to the classpath. ---------------------------------------- Doug, I've really considered keeping everything at the Directory level, as you suggested. This would have been preferred, I agree, but I really couldn't find a way to reconsile this approach with the other two goals I had: (a) keep specific file extension knowledge at the lucene.index.* level where it is now, and (b) avoid having to support writes to the compound file. ---------------------------------------- I'm attaching the patches against the current Lucene CVS source (basically output of "cvs diff -Buw"). The files listed as "?" are new files and are also attached. (BTW, there are currently two failures in the existing JUnit test cases, but they occur with or without these patches, as has already been noted by Otis, Doug and Eric). Finally, I should theoretically have commit access to Lucene's CVS, but I've never tried using it yet. If these changes seem ok, I could commit them myself (provided I can find my password, etc., etc.). Enjoy. Dmitry. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
