There's org.apache.lucene.index.CheckIndex which will report assorted stats about the index, as well as checking it for correctness. It can fix it too but you don't need that. I hope. Will take quite a while to run on a large index.
What version of lucene? Does a before/after (or large/small) directory listing give any clues? -- Ian. On Thu, Oct 27, 2011 at 12:44 PM, <[email protected]> wrote: > Hi, > > I have an application that has an index with 30 millions docs in it. every > day, I add around 1 million docs, and I remove the oldest 1 million, to > keepit stable at 30 million. > for the most part doc fields are indexed and stored. each doc weighs > around from a few Kb to a 1 Mb (a few Mb in some cases). > I used to be able to maintain the index at around 60 Gb on disk. but > recently the index has had a tendency to keep growing (90 Gb). I can see > that the expunge is doing what it should do, because after it executes, > the size on disk does go down, but never as low as the previous day. from > the outside, it looks like a leak, but since I do not remove the docs I > added during the day, it might be that the new docs are just bigger than > the old ones. still I am surprised with the increase. > > are there any tools to dig into the index structure and help justify the > space taken on disk? > I was thinking about something that would help identify terms that take up > the most space, or some sort of dump that I could compare from one day to > the other. > > any help appreciated, > > thanks, > > vince --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
