On Fri, Apr 26, 2002 at 07:05:23PM +0200, petite_abeille wrote: > I guess it's really not my day... > [...] > Well, it's pretty ugly. Whatever I'm doing with Lucene in the previous > package (com.lucene) is magnified many folds in rc4. After processing a > paltry 16 objects I got: > > "SZFinder.findObjectsWithSpecificationInStore: > java.io.FileNotFoundException: _2.f14 (Too many open files)"
Sounds like a pretty nasty situation. One suggestion I have for you is that Doug is usually very helpful with problems like this IF you can first narrow down what is happening to the point that you can post a clear, specific, isolated test that consistently causes the problem to happen. This makes sense - any effort to solve the problem will first involve isolating the bug, and that's a task you're best suited for, since you know your system best. So maybe your best approach would be to take a copy of your system as above, and start gradually stripping out stuff, testing between each run, until you have most of the application-specific stuff removed, but the problem is still reoccurring consistently. Then post your code and ask if some of the more lucene-knowledgable can take a look. Re: index integrity, I agree that it would be really, really nice to have some sort of "sanity" check. I have yet to actually get into the internals of the index, but I'd guess that there must be some sort of at least superficial check, maybe some sort of format check. If I was going to kludge something together, the first approach I'd take would be to just open the index and roll through all of the Documents in it, accessing all of the fields (or maybe just a few main fields per Document). I"m not sure what I'd *do* with the field values (printing them out to the screen might take a while), other than perhaps checking for nulls. But I suspect that if the code gets throught that without causing an exception or getting null values, then at least the index's internal format is intact. Maybe the test code could save the number of lucene Document objects in the index in between checks (and, of course, update this number when you add or remove documents), and make sure it still has the right number of documents. As for repairing an index, I think that's working sort of against the grain of Lucene. In your case, it sounds like rebuilding the index is important, because you're using Lucene as a data store. I have some similar issues myself in some things I want to build (I end up wanting both a data store and a search index; ultimately I've ended up choosing to have a separate data store for the extra data). But Lucene is a search index, meant to be used more in a cache-like style, so there's an underlying assumption that the original data is always around to reindex. Thus, repairing an index is less important, since it is assumed you can always rebuild it. I don't know much of the theories behind data store systems. It occurs to me that using Lucene as a data store, you'll always be working against the grain, always swimming upstream. Maybe it'd be a better idea to figure out some way to use Lucene as the indexing technology in a data store, the way traditional RDBMSes use indexes, for speeding access. Or possibly you should look at Xindice (http://xml.apache.org/xindice/) which is an XML database. You might find it easier to adapt that to your needs. I'm kind of curious as to how fast Xindice's XPath execution is, and what their indexing is based on - there might be a use for Lucene there. Steven J. Owens [EMAIL PROTECTED] -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>