In my index of e-mail message parts, it looks like 23K is being used up for each indexed message part, which is way more than I'd expect.
I have a total of 37 fields per message part. I tokenize, index and do not store message part bodies. I store a <= 300 character synopsis of each message part. All of the other fields are message metadata, which is tokenized, indexed and stored but these rarely exceed 100 characters - they are for example To, From, Cc, Subject, Date I'm still using Lucene 1.4.3, but am in the process of migrating to 1.9. Is there any way that I can get a picture of what's occupying all the space?
smime.p7s
Description: S/MIME cryptographic signature