Fredrik wrote:
Opening the index with Luke, I can see the following:
Number of fields: 17
Number of documents: 1165726
Number of terms: 6721726

The size of the index is approx 5,3 GB.
Lucene version is 1.4.3.

The index contains Norwegian terms, but lots of inline HTML, etc
is probably increasing the index term count (should 'wash' these
unwanted terms away when indexing documents). The analysis below
shows that TermInfosReader.java:132 -> get() is trying to allocate
a huge memory slab.
[ ... ]
'need 532676624 bytes' means that something is allocating a 500Mb slab
of memory.

Lucene will try to allocate an array of 6721726/128 ~= 50k terms. The array alone will require a 200kB "slab" and the terms perhaps 1MB or more. But not 500MB. So I think something else is the culprit here.

Have you tried inserting print statements at the suspected allocations, to see how big the arrays actually are? Are you perhaps creating a new IndexSearcher per search, rather than reusing a single IndexSearcher? 1MB per query could quickly exhaust RAM if the GC can't keep up.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to