[jira] Updated: (LUCENE-756) Maintain norms in a single file .nrm

Doron Cohen (JIRA) Wed, 20 Dec 2006 20:28:44 -0800

     [ http://issues.apache.org/jira/browse/LUCENE-756?page=all ]


Doron Cohen updated LUCENE-756:
-------------------------------

    Attachment: nrm.patch.txt

Replacing the patch file (prev file was garbage - "svn stat" instead of "svn 
diff").

Few words on how this patch works: 
- <segment>.nrm file was added.
- addDocument  (DocumentWriter) still writes each norm to a separate file - but 
that's in memory, 
- at merge, all norms are written to a single file.
- CFS now also maintains all norms in a single file.
- IndexWriter merge-decision now considers hasSeparateNorms() not only for CFS 
but also for non compound.
- SegmentReader.openNorms() still creates ready-to-use/load Norm objects (which 
would read the norms only when needed). But the Norm object is now assigned a 
normSeek value, which is nonzero if the norm file is <segment>.nrm.
- existing indexes, prior to this change, are managed the same way that 
segments resulted of addDocument are managed.

Tests:
- I verified that also the (contrib) tests for FieldNormModifier and 
LengthNormModofier are working.

Remaining:
- I might add a test.
- more benchmarking?
- update fileFormat document.

> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: http://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>         Attachments: nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity 
> comparing to compound indexes. But their file descriptors foot print is much 
> higher. 
> By maintaining all field norms in a single .nrm file, we can bound the number 
> of files used by non compound indexes, and possibly allow more applications 
> to use this format.
> More details on the motivation for this in: 
> http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html
>  (in particular 
> http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-756) Maintain norms in a single file .nrm

Reply via email to