Hi,

I'm wondering if there is a kind of "formule" to estimate the size of a
lucene index. Searching the list, I did not find any pointers.

Does anybody has a hint?

What I figured out from the file format description and some empirical
tests is, that for every index-file:
Field-files:
  field-data .fdt:  NumberOfDocs * NumberOfFieldsPerDoc
  field-index .fdx: NumberOfDocs * 8
  field-info .fnm:  ignored

Term-Files:
  term-data .tis:   NumberOfTerms * 8
  term-index .tii:  no idea so far
  term-freq: .frq:  estimated as NumberOfDocs * NumberOfTerms
Normalization:
  Norm file: .nrm:  NumberOfDocs

This concerns only Un-stored fields of course.

I estimate the total NumberOfTerms of my document collection with 10% of
the NumberOfDocuments. Does someone has similiar experience?

lofi


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to