Re: waaaay too many files in the index!

Michael McCandless Wed, 04 Feb 2009 02:13:29 -0800

These files are normal Lucene segment files (in compound fileformat). What's odd is that Lucene is not merging them down to asmaller set of segments.

Have you done any advanced things, like customize the deletion ormerge policy?


When you close you writer, are you using just close() or close(false)?

If you can set the InfoStream on the writer for one of yourincremental update sessions and post the results, maybe that will shedsome light one what's going on.

Are you sure there are no exceptions being logged somewhere? Luceneruns merges with background threads (by default), and if those threadshit unhandled exceptions it's possible they are logged somewhere youwouldn't normally look?


Mike

John Byrne wrote:

MergeFactor and MergeDocs are left at default values. The indexingis incremental, i.e. whenever someone adds or modifys a file to insvn repository, the lucene index is updated, and the writer/reader/searcher are refreshed (closed and opened again).,
According to the svn logs for the time the files were created, a fewhundred files were added that day.
Overall, the index would have started out with around 150,000 to200,000 documents, with anything from 100 to 1000 being added per day.
I don't optimize the index at any point, but I've never seen it getlike this before.
Thanks,
John

Erick Erickson wrote:
What are your IndexWriter MergFactor and MergeDocs set to? Also, are
the dates on all these files indicative of all being create duringthe same
indexing run?

Finally, how many documents are you indexing?

Best
Erick
On Tue, Feb 3, 2009 at 10:26 AM, John Byrne<[email protected]> wrote:
Hi,

I've got a weird problem with a lucene index, using 2.3.1. The index
contains 6660 files. I don't know how this happened.Maybe somonecan tell me
something about the files themselves? (examples below)
On one day, between 10 and 40 of these files were being createdeveryminute. The index updates are triggered by updates to an SVNrepository, but
I can't find any corresponding activity in the SVN logs.

The lucene files all have names like this:

_1qsw.cfs
_1qsx.cfs
_1qsy.cfs
_1qsz.cfs
_1qt0.cfs

and are mostly < 5K in size.

My application uses just one instance each of
IndexReader/IndexWriter/IndexSearcher. From looking at
Can anyone shed any light on these files? I'm not too hopefulabout fixingthis index because we are getting "too many open files", even withan
unlimited ulimit, but any info/suggestions would be great. Thanks.

-John




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
------------------------------------------------------------------------
No virus found in this incoming message.
Checked by AVG - http://www.avg.com Version: 8.0.233 / VirusDatabase: 270.10.17/1933 - Release Date: 2/3/2009 5:48 PM
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: waaaay too many files in the index!

Reply via email to