Give Luke a try. Google for "Luke Lucene" and you should find it.
Otherwise check the Lucene website for a reference.
Rob Staveley (Tom) wrote:
In my index of e-mail message parts, it looks like 23K is being used up for
each indexed message part, which is way more than I'd expect.
I have a total of 37 fields per message part.
I tokenize, index and do not store message part bodies.
I store a <= 300 character synopsis of each message part.
All of the other fields are message metadata, which is tokenized, indexed
and stored but these rarely exceed 100 characters - they are for example To,
From, Cc, Subject, Date
I'm still using Lucene 1.4.3, but am in the process of migrating to 1.9.
Is there any way that I can get a picture of what's occupying all the space?
--
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]