Interesting. I am explicitly turning on the compound file format when I start my application, but I am suspicious about my optimizing thread. It *ought* to be optimising every 30 minutes, using thread synchronisation to prevent the writer from trying to write while optimisation takes place, but it is possible that I'm screwing up there (I'll add some diagnostics to check that optimisation and index writing are mutually exclusive). When I stopped my daemon and manually optimised, it took 11 minutes to optimise the index. Is your understanding that .fdt, .frq and .prx files are working files pre-optimisation and then when optimize() is called they should all get absorbed into the .cfs? Manual optimisation only clawed back 1G, but I didn't look to see if .fdt, .frq and .prx files were absorbed into the .cfs files in the process. I'll investigate that now.
> Can you try a smaller sample in a clean directory and see what size it is (so that it doesn't take as long to index)? I'll try tee-ing off a message feed and index in a new index. I'm working with a live message feed. -----Original Message----- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: 26 May 2006 18:38 To: java-user@lucene.apache.org Subject: Re: Seeing what's occupying all the space in the index It seems odd to me that if you are using the CFS format, why you would have the .fdt, .frq and .prx files in addition to the .cfs files. My understanding is all files (except deletable and segment) get put inside of the CFS file. Looking at my indices, I only have the CFS file. Are you optimizing your indices after you are done indexing? Are you turning off compound file format? Can you try a smaller sample in a clean directory and see what size it is (so that it doesn't take as long to index)?
smime.p7s
Description: S/MIME cryptographic signature