Hi, I'm using Lucene to index ~3.5M documents, over about 50 fields. The Lucene
index itself is ~10.5GB, spread over ~7,000 files. Some of these files are
"large" -- that is, several PRX files are ~1.5GB.

Lucene runs on a dedicated server (Linux on a 1Ghz Dell, with 1GB RAM). Clients
on other machines use RMI to perform reads / writes. Each night the server
automatically performs an optimize.

The problem is that the optimize now dies with an OutOfMemory exception, even
when the JVM heap size is set to its maximum of 2GB. I need to optimize, because
as the number of Lucene files grows, search performance becomes unacceptable.

Search performance is also adversely affected because I've had to effectively
single-thread reads and writes. I was using a simple read / write lock
mechanism, allowing multiple readers to simultaneously search, but now more than
3-4 simultaneous readers will also cause an OutOfMemory condition. Searches can
take as long as 30-40 seconds, and with single-threading, that's crippling the
main client application.

Needless to say, the Lucene index is mission-critical, and must run 24/7.

I've seen other posts along this same vein, but no definite consensus. Is my
problem simply inadequate hardware? Should I run on a 64-bit platform, where I
can allocate a Java heap of > 2GB?

Or could there be something fundamentally "wrong" with my index? I should add
that I've just spent about a week (!!) rebuilding from scratch, over all 3.5M
documents.

-- Many thanks for any help! Mark Florence


Reply via email to