Hi Erick, I did a search just as JVM started... so I'm thinking that the JVM is busy with some startup stuff... and that this search required more memory than what is available at that time.
Had I done this search a while after the JVM has started, then this query succeeds. I then pump in several similar queries running on a different thread and it takes a long time but still runs to completion until one of them generates OOM.But still, queries like this is just using too much memory. As for clauses, the BooleanQuery was set to max clause of... 9,000,000 I'm guessing that might have caused the usage of too much memory? I'll try the explain on you've suggested. Thanks, M ________________________________ From: Erick Erickson <erickerick...@gmail.com> To: java-user@lucene.apache.org Sent: Wednesday, April 1, 2009 6:51:13 PM Subject: Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces Think about putting this query in Luke and doing an "explain" for details, but.... I'm surprised this is working at all without throwing TooManyClauses errors. Under the covers, Lucene expands your wildcards to all terms in the field that match. For instance, assume your document field has the following: aa ab ac ad ae Now, searching for a* produces a clause like: (aa OR ab OR ac OR ad OR ae) in place of the a* So your query is generating ginormous OR clauses, one that contains every term in your content field starting with 'g'. Another with every term in your content field starting with 'h' etc. So I suspect that your content field doesn't have very many distinct terms in it.... As for why it's returning few entries, what does this part of your query return by itself? Since it's anded with your wildcard query, it might be what's limiting your results. ((+sender:cpuser9 +viewers:cpuser4) (+sender:cpuser4 +viewers:cpuser9) (+viewers:cpuser9 +viewers:cpuser4)) But I'm puzzled, because saying that you're getting OOM errors doesn't square very well with getting *any* results at all, so is there something else going on? Best er...@morequestionsthananswers. On Wed, Apr 1, 2009 at 1:31 PM, Lebiram <lebi...@ymail.com> wrote: > Hi All, > > I have the following query on a 1GB index with about 12 million docs : > As you can see the terms consist of wildcards... > > query.toString()=+(+content:g* +content:h* +content:d* +content:s* > +content:a* +content:w* +content:b* +content:c* +content:m* +content:e*) > +((+sender:cpuser9 +viewers:cpuser4) (+sender:cpuser4 +viewers:cpuser9) > (+viewers:cpuser9 +viewers:cpuser4)) > > The Searcher is a MultiSearcher with 4 IndexSearchers pointing to 4 > different Lucene Index. > This search returns very few records, several ten thousand hits. > > Java is assigned with 1GB max memory. > > Somehow this search eats the entire java heap space and causes OOM. > Looking at jProfiler, i see org.apache.lucene package generating a lot of > objects which I believe is taking all this space. > > Can anyone explain the reason why this particular search might take so much > memory? > Is there anything I am doing wrong here? > More importantly, is there anything I can do to improve this? > > -M > > >