Hi, I'm using Lucene to index ~3.5M documents, over about 50 fields. The Lucene index itself is ~10.5GB, spread over ~7,000 files. Some of these files are "large" -- that is, several PRX files are ~1.5GB.
Lucene runs on a dedicated server (Linux on a 1Ghz Dell, with 1GB RAM). Clients on other machines use RMI to perform reads / writes. Each night the server automatically performs an optimize. The problem is that the optimize now dies with an OutOfMemory exception, even when the JVM heap size is set to its maximum of 2GB. I need to optimize, because as the number of Lucene files grows, search performance becomes unacceptable. Search performance is also adversely affected because I've had to effectively single-thread reads and writes. I was using a simple read / write lock mechanism, allowing multiple readers to simultaneously search, but now more than 3-4 simultaneous readers will also cause an OutOfMemory condition. Searches can take as long as 30-40 seconds, and with single-threading, that's crippling the main client application. Needless to say, the Lucene index is mission-critical, and must run 24/7. I've seen other posts along this same vein, but no definite consensus. Is my problem simply inadequate hardware? Should I run on a 64-bit platform, where I can allocate a Java heap of > 2GB? Or could there be something fundamentally "wrong" with my index? I should add that I've just spent about a week (!!) rebuilding from scratch, over all 3.5M documents. -- Many thanks for any help! Mark Florence