RE: Seeing what's occupying all the space in the index

Rob Staveley (Tom) Fri, 26 May 2006 10:24:02 -0700

That's a really good idea, but I've got a total of 38 fields only. It is
true that some of them are empty, but that can't account for the bulk.

-----Original Message-----
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: 26 May 2006 17:50
To: java-user@lucene.apache.org
Subject: RE: Seeing what's occupying all the space in the index

are you by any chance using different field names for each document -- or do
you have a wide range of field names that aren't the same for each document?
... you mentioned indexing emails, email has a very loose header structure
that allows MTAs to add arbitrary "X" headers, are you converting every
header to an indexed field?

the reason i ask is that anytime you have an indexed field with and you
don't OMIT_NORMS when you add the field, lucene allocates one byte per
document for that field to store the norm value -- even most of those
documents don't have that field.

if you've got ~25,000 documents in your index, and you add a new document
with an indexed field no other document has, then you'll see at least a 25K
increase in index size because of those norms.

(OMIT_NORMS is your friend).

smime.p7s
Description: S/MIME cryptographic signature

RE: Seeing what's occupying all the space in the index

Reply via email to