Subject:

Re: Homogeneous vs Heterogeneous indexes (was: FileNotFoundException)
From:

petite_abeille <[EMAIL PROTECTED]>
Date:

Wed, 1 May 2002 08:37:51 +0200

To:

"Lucene Users List" <[EMAIL PROTECTED]>


    On Wednesday, May 1, 2002, at 12:41 AM, Dmitry Serebrennikov wrote:

>     - the number of files that Lucene uses depends on the number of
>     segments in the index and the number of *stored* fields
>     - if your fields are not stored but only indexed, they do not
>     require separate files. Otherwise, an .fnn file is created for
>     each field.


    Ok. That's good as all my fields are indexed but not stored in
    Lucene. Only one field is stored in any one index: the uuid of an
    object (as a Keyword).

>     - if at least one document uses a given field name in an index,
>     that index requires the .fnn file for that field


    Ok. So, in theory, more homogeneous index should use less files all
    things being equal?

I think so... I guess you have many kinds of documents that have some 
fields in common and some unique? Yes, then having the same kinds of 
documents in a given index will reduce the total number of files. 
Personally, I don't have experience with this since all of my documents 
have the same fields.


>     - index segments are created when documents are added to the
>     index. For each 10 docs you get a new segment.
>     - optimizing the index removes all segments are replaces them with
>     one new segment that contains all of the documents
>     - optimization is done periodically as more documents are added
>     (controlled by IndexWriter.mergeFactor), but can be done manually
>     whenever needed


    Ok. When doing the optimization, are there any temporary files
    getting created?

Nope, just the files for the new segment. Well, I think the segments and 
deleted files might have "segments.new" and "deleted.new" while they are 
being modified, with the old ones removed and new ones renamed afterwards.

>     Some additional info: there is a field on IndexWriter called
>     infoStream. If this is set to a PrintStream (such as System.out),
>     various diagnostic messages about the merging process will be
>     printed to that stream.


    Yep. I guess I overlooked that.

>     You might find this helpful in tuning the merge parameters.


    Just to make sure: using a small merge factor (eg 2) will reduce the
    number of files or just optimize (aka merge) the index more often?

It will optimize more often and, since optimization replaces all 
segments with one, the number of files will drop. However, the old files 
will stay around until they are no longer in use by pre-existing 
IndexReader instances, so that may be another catch.

Dmitry.



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to