Re: OutOfMemory Problems Lucene 2.4 / Tomcat

Mark Miller Thu, 30 Oct 2008 03:38:42 -0700

Michaels got some great points (he the lucene master), especiallypossibly turning off norms if you can, but for an index like that i'dreccomwnd solr. Solr sharding can be scaled to billions (min a billionor two anyway) with few limitations (of course there are a few). Plusit has further caching options, indexreader refresh managment, etc etcetc


- Mark


On Oct 29, 2008, at 10:30 PM, "Todd Benge" <[EMAIL PROTECTED]> wrote:

Thanks Mark.  I appreciate the help.

I thought our memory may be low but wanted to verify there if there is
any way to control memory usage.  I think we'll likely upgrade the
memory on the machines but that may just delay the inevitable.

Wondering if anyone else has encountered similar issues with indices
if a similar size.  I've been thinking we will need to move to a
clustered solution and have been reading on hadoop, nutch, solr &
terracotta for possibilities such as index sharding.

Has anyone implemented a solution using hadoop or terracotta for a
large scale system?  Just wondering the pro's / con's of the various
approaches.

Thanks,

Todd
On Wed, Oct 29, 2008 at 6:07 PM, Mark Miller <[EMAIL PROTECTED]>wrote:
The term, terminfo, indexreader internals stuff is prob on the lowendcompared to the size of your field caches (needed for sorting). Ifyou aresorting by String I think the space needed is 32 bits x number ofdocs + anarray to hold all of the unique terms. So checking 300 million docs(I knowyou are actually breaking it up smaller than that, but for example)andignoring things like String chars being variable byte lengths andstoringthe length, etc and randomly picking 50000 unique terms at 6 charsper:
32 bits x 300000000 + 50000 x 6 x 16 bits to MB = 1 144.98138megabytes
Thats per field your sorting on. If you are sorting on an int fielditshould be closer to 32 bits x num docs - shorts, 32 bits x numdocs, etc.
So you have those field caches, plus the IndexReader terminfo, termstuff,plus whatever RAM your app needs beyond Lucene. 4 gig might justnot *quite*
cut it is my guess.

Todd Benge wrote:
There's usually only a couple sort fields and a bunch of terms inthe
various indices.  The terms are user entered on various media so the
number of terms is very large.

Thanks for the help.

Todd



On 10/29/08, Todd Benge <[EMAIL PROTECTED]> wrote:
Hi,
I'm the lead engineer for search on a large website using lucenefor
search.
We're indexing about 300M documents in ~ 100 indices. Theindices add
up to ~ 60G.
The indices are sorted into 4 different Multisearcher with thelargest
handling ~50G.

The code is basically like the following:

private static MultiSearcher searcher;

public void init(File files) {

   IndexSearcer [] searchers = new IndexSearcher[files.length] ();
   int i = 0;
   for ( File file: files ) {
        searchers[i++] = new
IndexSearcher(FSDirectory.getDirectory(file);
   }

searcher = new MultiSearcher(searchers);
}

public Searcher getSearcher() {
 return searcher;
}

We're seeing a high cache rate with Term & TermInfo in Lucene 2.4.
Performance is good but servers are consistently hanging with
OutOfMemory errors.

We're allocating 4G in the heap to each server.

Is there any way to control the amount of memory Lucene consume for
caching?  Any other suggestions on fixing the memory errors?

Thanks,

Todd
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

Reply via email to