Hi Li, Sorry for taking so long to answer your questions.
We came up with splitting our index into smaller units after we realized that we have to deal with an index of the size of many GB. Updating and optimizing such large files becomes a bottle neck. We portioned our index based on when the indexed units where created. Updates usually happen only on current units and rarely on units for previous years. In terms of performance I think there is very little difference and if stated in another response it really depends on your hardware. All index directories are located on the same box and drive. The documents are not distributed into several files. I suppose you do not talk about a Lucene document but rather about an indexed unit. It really depends how you organize your index but my experience is not to split one indexed unit into parts. When I started to index our units we separated meta data from aggregated units, like for example a books meta information like ISBN etc. and its pages. Each page (or aggregated unit) was a single Lucene document. This made it somehow difficult to assemble the information as the UI dictated it and we went back to treat one unit and its aggregates as a single Lucene document which made the reading faster. Andreas -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 19, 2007 8:05 PM To: java-user@lucene.apache.org Subject: RE: Lucene index performance Hi Andreas, I am very interested in the multiple index file index/search. Can you kindly help me on following questions? 1) Why you use multi index files? How much is the performance gain for both indexing and searching? Someone reported that there no big performance difference except the number if indices is huge, like 1000. 2) Are these index files located in a single machine or distributed into multiple machines? 3) How do you distribute the document into several index files? Thanks a lot, Li -----Original Message----- From: Andreas Guther [mailto:[EMAIL PROTECTED] Sent: Monday, June 18, 2007 4:00 AM To: java-user@lucene.apache.org Subject: Re: Lucene index performance Searching on multiple index files is incredible fast. We have 10 different index folders with different sizes. All folders together have a size of 7 GB. Results come back usual within less than 50 ms. Getting results out of the index i.e. reading documents is expensive and you will have to spent time here to get a good performance. You will need to look into - Topdocs - Extracting results in an ordered way, i.e. sort by index and within an index by document id. This will help to minimize disk head jumps and gave me a tremendous boost. - Extracting only what you need (using a special read filter I do not recall the name right now and I do not have access to my sources at the moment of writing this) Andreas On 6/17/07, Mark Miller <[EMAIL PROTECTED]> wrote: > > > > Lee Li Bin wrote: > > Hi, > > > > I would like to know how's the performance during indexing and searching > of > > results on a large index files would be like. > > > Fast. > > And is it possible to create multiple index files and search across > multiple > > index files? > Yes. > > If possible, may I know how could it be done? > > > Check out MultiSearcher. > > http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/or g/apache/lucene/search/MultiSearcher.html > > Thanks a lot. > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]