Re: Homogeneous vs Heterogeneous indexes (was: FileNotFoundException)

petite_abeille Tue, 30 Apr 2002 23:20:06 -0700

On Wednesday, May 1, 2002, at 12:41 AM, Dmitry Serebrennikov wrote:

> - the number of files that Lucene uses depends on the number of 
> segments in the index and the number of *stored* fields
> - if your fields are not stored but only indexed, they do not require 
> separate files. Otherwise, an .fnn file is created for each field.


Ok. That's good as all my fields are indexed but not stored in Lucene. 
Only one field is stored in any one index: the uuid of an object (as a 
Keyword).

> - if at least one document uses a given field name in an index, that 
> index requires the .fnn file for that field

Ok. So, in theory, more homogeneous index should use less files all 
things being equal?

> - index segments are created when documents are added to the index. For 
> each 10 docs you get a new segment.
> - optimizing the index removes all segments are replaces them with one 
> new segment that contains all of the documents
> - optimization is done periodically as more documents are added 
> (controlled by IndexWriter.mergeFactor), but can be done manually 
> whenever needed

Ok. When doing the optimization, are there any temporary files getting 
created?

> With all this, I think Lucene does use too many files...

That's my impression also...

> Some additional info: there is a field on IndexWriter called 
> infoStream. If this is set to a PrintStream (such as System.out), 
> various diagnostic messages about the merging process will be printed 
> to that stream.

Yep. I guess I overlooked that.

> You might find this helpful in tuning the merge parameters.

Just to make sure: using a small merge factor (eg 2) will reduce the 
number of files or just optimize (aka merge) the index more often?

> Hope this helps.
> Good luck.

Thanks. Very helpful indeed :-)

R.



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: Homogeneous vs Heterogeneous indexes (was: FileNotFoundException)

Reply via email to