[ http://issues.apache.org/jira/browse/LUCENE-756?page=all ]
Doron Cohen updated LUCENE-756:
-------------------------------
Attachment: nrm.patch.txt
Replacing the patch file (prev file was garbage - "svn stat" instead of "svn
diff").
Few words on how this patch works:
- <segment>.nrm file was added.
- addDocument (DocumentWriter) still writes each norm to a separate file - but
that's in memory,
- at merge, all norms are written to a single file.
- CFS now also maintains all norms in a single file.
- IndexWriter merge-decision now considers hasSeparateNorms() not only for CFS
but also for non compound.
- SegmentReader.openNorms() still creates ready-to-use/load Norm objects (which
would read the norms only when needed). But the Norm object is now assigned a
normSeek value, which is nonzero if the norm file is <segment>.nrm.
- existing indexes, prior to this change, are managed the same way that
segments resulted of addDocument are managed.
Tests:
- I verified that also the (contrib) tests for FieldNormModifier and
LengthNormModofier are working.
Remaining:
- I might add a test.
- more benchmarking?
- update fileFormat document.
> Maintain norms in a single file .nrm
> ------------------------------------
>
> Key: LUCENE-756
> URL: http://issues.apache.org/jira/browse/LUCENE-756
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Doron Cohen
> Assigned To: Doron Cohen
> Priority: Minor
> Attachments: nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity
> comparing to compound indexes. But their file descriptors foot print is much
> higher.
> By maintaining all field norms in a single .nrm file, we can bound the number
> of files used by non compound indexes, and possibly allow more applications
> to use this format.
> More details on the motivation for this in:
> http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html
> (in particular
> http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]