That's a really good idea, but I've got a total of 38 fields only. It is true that some of them are empty, but that can't account for the bulk.
-----Original Message----- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: 26 May 2006 17:50 To: java-user@lucene.apache.org Subject: RE: Seeing what's occupying all the space in the index are you by any chance using different field names for each document -- or do you have a wide range of field names that aren't the same for each document? ... you mentioned indexing emails, email has a very loose header structure that allows MTAs to add arbitrary "X" headers, are you converting every header to an indexed field? the reason i ask is that anytime you have an indexed field with and you don't OMIT_NORMS when you add the field, lucene allocates one byte per document for that field to store the norm value -- even most of those documents don't have that field. if you've got ~25,000 documents in your index, and you add a new document with an indexed field no other document has, then you'll see at least a 25K increase in index size because of those norms. (OMIT_NORMS is your friend).
smime.p7s
Description: S/MIME cryptographic signature